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Preface 



This volume contains the proceedings of the Latin American Theoretical Infor- 
matics (LATIN) conference that was held in Buenos Aires, Argentina, April 5-8, 
2004. 

The LATIN series of symposia was launched in 1992 to foster interactions 
between the Latin American community and computer scientists around the 
world. This was the sixth event in the series, following Sao Paulo, Brazil (1992), 
Valparaiso, Chile (1995), Campinas, Brazil (1998), Punta del Este, Uruguay 
(2000), and Cancun, Mexico (2002). The proceedings of these conferences were 
also published by Springer- Verlag in the Lecture Notes in Computer Science 
series: Volumes 583, 911, 1380, 1776, and 2286, respectively. Also, as before, we 
published a selection of the papers in a special issue of a prestigious journal. 

We received 178 submissions. Each paper was assigned to four program com- 
mittee members, and 59 papers were selected. This was 80% more than the 
previous record for the number of submissions. We feel lucky to have been able 
to build on the solid foundation provided by the increasingly successful previous 
LATINs. And we are very grateful for the tireless work of Pablo Martinez Lopez, 
the Local Arrangements Chair. Finally, we thank Springer- Verlag for publishing 
these proceedings in its LNCS series. 



December 2003 



Martin Farach-Colton 
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Analysis of Scheduling Algorithms for 
Proportionate Fairness 



Mike Paterson 

Department of Computer Science 
University of Warwick, Coventry, UK 



Abstract. We consider a multiprocessor operating system in which each 
current job is guaranteed a given proportion over time of the total pro- 
cessor capacity. A scheduling algorithm allocates units of processor time 
to appropriate jobs at each time step. We measure the goodness of such 
a scheduler by the maximum amount by which the cumulative processor 
time for any job ever falls below the “fair” proportion guaranteed in the 
long term. 

In particular we focus our attention on very simple schedulers which 
impose minimal computational overheads on the operating system. For 
several such schedulers we obtain upper and lower bounds on their devi- 
ations from fairness. The scheduling quality which is achieved depends 
quite considerably on the relative processor proportions required by each 
job. 

We will outline the proofs of some of the upper and lower bounds, both 
for the unrestricted problem and for restricted versions where constraints 
are imposed on the processor proportions. Many problems remain to be 
investigated and we will give the results of some exploratory simulations. 
This is joint research with Micah Adler, Petra Berenbrink, Tom Friedet- 
zky, Leslie Ann Goldberg and Paul Goldberg. 
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Advances in the Regularity Method 



Yoshiharu Kohayakawa* 

Institute de Matematica e Estatistica, Universidade de Sao Paulo 
Rua do Matao 1010, 05508-090 Sao Paulo, Brazil 
yoshiOime . usp . br 



A beautiful result of Szemeredi on the asymptotic structure of graphs is his 
regularity lemma. Roughly speaking, his result tells us that any large graph may 
be written as a union of a bounded number of induced, random looking bipartite 
graphs (the so called e -regular pairs). Many applications of the regularity lemma 
are based on the following fact, often referred to as the counting lemma: Let G he 
an s-partite graph with vertex partition V{G) = lJi=i where |y| = m for all i 

and all pairs {Vi, Vj) are e-regular of density d. Then G contains {l-\- f{e))d^^^ 
cliques Kg, where f{e) — >■ 0 as e — 1 - 0. The combined application of the regularity 
lemma followed by the counting lemma is now often called the regularity method. 

In recent years, considerable advances have occurred in the applications of 
the regularity method, of which we mention two: (z) the regularity lemma and 
the counting lemma have been generalized to the hypergraph setting and (ii) the 
case of sparse graphs is now much better understood. 

In the sparse setting, that is, when n-vertex graphs with o(n^) edges are 
involved, most applications have so far dealt with random graphs. In this talk, 
we shall discuss a new approach that allows one to apply the regularity method 
in the sparse setting in purely deterministic contexts. 

We cite an example. Random graphs are known to have several fault-tolerance 
properties. The following result was proved by Alon, Capalbo, Rddl, Rucihski, 
Szemeredi, and the author, making use of the regularity method, among others. 
The random bipartite graph G = G{n,n,p), with p = cn“ (log and k 

a fixed positive integer, has the following fault-tolerance property with high 
probability: for any fixed 0 < a < 1, if c is large enough, even after the removal 
of any a-fraction of the edges of G, the resulting graph still contains all bipartite 
graphs with at most a(a)n vertices in each vertex class and maximum degree at 
most k, for some a: [0, 1) — >■ (0, 1]. 

Clearly, the above result implies that certain sparse fault-tolerant bipartite 
graphs exist. With the techniques discussed in this talk, one may prove that the 
celebrated norm-graphs of Kollar, Ronyai, and Szabo, of suitably chosen density, 
are concrete examples. 

This is joint work with V. Rodl and M. Schacht (Emory University, Atlanta). 



* Partially supported by MCT/CNPq (ProNEx Project Proc. CNPq 664107/1997-4) 
and by CNPq (Proc. 300334/93-1 and 468516/2000-0) 
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Fighting Spam: The Science 



Cynthia Dwork 

Microsoft Research, Silicon Valley Campus; 1065 La Avenida, Mountain View, 
CA 94043 USA; dworkOmicrosof t . com 



Consider the following simple approach to fighting spam [5] : 

If I don’t know you, and you want your e-mail to appear in my inbox, 
then you must attach to your message an easily verified “proof of com- 
putational effort”, just for me and just for this message. 

If the proof of effort requires 10 seconds to compute, then a single machine 
can send only 8,000 messages per day. The popular press estimates the daily 
volume of spam to be about 12-15 billion messages [4,6]. At the 10-second price, 
this rate of spamming would require at least 1,500,000 machines, working full 
time. 

The proof of effort can be the output of an appropriately chosen moderately 
hard function of the message, the recipient’s e-mail address, and the date and 
time. To send the same message to multiple recipients requires multiple compu- 
tations, as the e-mail addresses vary. Similarly, to send the same (or different) 
messages, repeatedly, to a single recipient requires repeated computation, as the 
dates and times (or messages themselves) vary. 

Initial proposals for the function [5,2] were CPU-intensive. To decrease dis- 
parities between machines. Burrows proposed replacing the original CPU-inten- 
sive pricing functions with memory-intensive functions, a suggestion first inves- 
tigated in [1]. 

Although the architecture community has been discussing the so-called 
“memory wall” - the point at which the memory access speeds and CPU speeds 
have diverged so much that improving the processor speed will not decrease 
computation time - for almost a decade [7], there has been little theoretical 
study of the memory-access costs of computation. A rigorous investigation of 
memory-bound pricing functions appears in [3], where several candidate func- 
tions (including those in [1]) are analyzed, and a new function is proposed. An 
abstract version of the new function is proven to be secure against amortization 
by a spamming adversary. 
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The Consequences of Imre Simon’s Work in the 
Theory of Automata, Languages, and 
Semigroups 



Jean-Eric Pin 

CNRS / Universite Paris VII, France 



Abstract. In this lecture, I will show how influential has been the work 
of Imre in the theory of automata, languages and semigroups. I will 
mainly focus on two celebrated problems, the restricted star-height prob- 
lem (solved) and the decidability of the dot-depth hierarchy (still open). 
These two problems lead to surprising developments and are currently 
the topic of very active research. I will present the prominent results of 
Imre on both topics, and demonstrate how these results have been the 
motor nerve of the research in this area for the last thirty years. 



Farach-Colton (Ed.): LATIN 2004, LNCS 2976, p. 5, 2004. 
Springer- Verlag Berlin Heidelberg 2004 




©s 



Querying Priced Information in Databases: 
The Conjunctive Case 
Extended Abstract 

Sany Laber^*, Renato Carmo^’^**, and Yoshiharu Kohayakawa^* * * 

^ Departamento de Informatica da Pontificia Universidade Catolica do Rio de Janeiro 
R. Marques de Sao Vicente 225, Rio de Janeiro RJ, Brazil 
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^ Instituto de Matematica e Estatistica da Universidade de Sao Panlo 
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Abstract. Query optimization that involves expensive predicates have 
received considerable attention in the database community. Typically, 
the output to a database query is a set of tuples that satisfy certain con- 
ditions, and, with expensive predicates, these conditions may be com- 
putationally costly to verify. In the simplest case, when the query looks 
for the set of tuples that simultaneously satisfy k expensive predicates, 
the problem reduces to ordering the evaluation of the predicates so as to 
minimize the time to output the set of tuples comprising the answer to 
the query. 

Here, we give a simple and fast deterministic fc-approximation algo- 
rithm for this problem, and prove that k is the best possible approxi- 
mation ratio for a deterministic algorithm, even if exponential time al- 
gorithms are allowed. We also propose a randomized, polynomial time 
algorithm with expected approximation ratio l + y/2/2 « 1.707 for k — 2, 
and prove that 3/2 is the best possible expected approximation ratio for 
randomized algorithms. 



1 Introduction 

The main goal of query optimization in databases is to determine how a query 
over a database should be processed in order to minimize the user response 
time. A typical query extracts the tuples from a database relation that satisfy a 
set of conditions, or predicates, in database terminology. For example, consider 

* Partially supported by FAPERJ (Proc. E-26/150. 715/2003) and CNPq (Proc. 
476817/2003-0) 

** Partially supported by CAPES (PICDT) and CNPq (Proc. 476817/2003-0) 

* * * Partially supported by MCT /CNPq (ProNEx, Proc. CNPq 664107/1997-4) and CNPq 
(Proc. 300334/93-1, 468516/2000-0 and Proc. 476817/2003-0) 

Farach-Colton (Ed.): LATIN 2004, LNCS 2976, pp. 6-15, 2004. 

Springer- Verlag Berlin Heidelberg 2004 




Querying Priced Information in Databases 



7 



the set of tuples D = {(oi, 6i), (ai, 62), (ai, ^s), (02, &i)} (see Fig. 1(a)) and a 
conjunctive query that seeks to extract the subset of tuples (oj, bj) for which 
satisfies predicate P\ and bj satisfies predicate P 2 - Clearly, these predicates can 
be viewed together as a 0/1-valued function S defined on the set of tuple elements 
{tti, tt2, 61, 627 b^}, with the convention that, 5{ai) = 1 if and only if P\{aj) holds 
and 6{bj) = 1 if and only if P 2 {bj) holds. The answer to the query is the set of 
pairs (ai,bj) with 6{ai,bj) = S(ai)S(bj) = 1. The query optimization problem 
that we consider is that of determining a strategy for evaluating J so as to 
compute this set of tuples by evaluating as few values of the function S as possible 
(or, more generally, with the total cost for evaluating the function 6 minimal). 

It is usually the case that the cost (measured as the computational time) 
needed to evaluate the predicates of a query can be assumed to be bounded by 
a constant so that the query can be answered by just scanning through all the 
tuples in D while evaluating the corresponding predicates. 

In the case of computationally expensive predicates, however, e.g., when the 
database holds complex data as images and tables, this constant may happen 
to be so large as to render this strategy impractical. In such cases, the different 
costs involved in evaluating each predicate must also be taken into account in 
order to keep user response time within reasonable bounds. 

Among several proposals to model and solve this problem (see, for example, 
[1,3,5]), we focus on the improvement of the approach proposed in [8] where, 
differently from the others, the query evaluation problem is reduced to an opti- 
mization problem on a hypergraph (see Fig. 1). 



1.1 Problem Statement 

A hypergraph is a pair G = {V{G),E{G)) where V{G), the set of vertices of G, 
is a finite set and each edge e £ E{G) is a non-empty subset of V{G). 

The size of the largest edge in G is called the rank of G and is denoted 
r{G). A hypergraph G is said to be uniform if each edge has size r(G), and is 
said to be /e-partite if there is a partition {Vi, . . . ,14} of P(G) such that no 
edge contains more than one vertex in the same partition class. A matching in 
a hypergraph G is a set M C E{G) with no two edges in M sharing a common 
vertex. A hypergraph G is said to be a matching if E{G) is a matching. 

Given a hypergraph G and a function 5 : V{G) — >■ {0, 1} we define an evalu- 
ation of (G, i5) as a set E C V{G) such that, knowing the value of 5{v) for each 
V G E, one may determine, for each e G E(G), the value of 

^(e) = n<^W- w 

v^e 



Given a hypergraph G and a function 7 : G(G) — >■ K we define the cost of a 
set A C V{G) by 7(A) = 

An instance to the Dynamic Multipartite Ordering problem (DM0) is an r(G) - 
partite, uniform hypergraph G, together with functions 5 and 7 as above. The 
objective in DM0 is to determine an evaluation of minimum cost for (G, (5, 7). 
Observe that while the value of 7(f) is known in advance for each v £ V (G), the 
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function 5 is ‘unknown’ to us at first. More precisely, the value of S(y) becomes 
known only when S(y) is actually evaluated, and this evaluation costs 7(f). The 
restriction of DMO to instances in which r{G) = 2 deserves special attention 
and will be referred to as the Dynamic Bipartite Ordering problem (DBO). 

Before we proceed, let us observe that DMO models our database problem 
as follows: the sets in the partition {Vi, . . . , 14} of V{G) correspond to the k 
different attributes of the relation that is being queried and each vertex of G 
corresponds to a distinct attribute value (tuple element). The edges correspond 
to tuples in the relation, 7 ( 1 ;) is the time required to evaluate <5 on -u and 5 {v) 
corresponds to the result of a predicate evaluated at the corresponding tuple 
element. 



B 



B 




bi 

b2 

b} 




bi 



Fig. 1. The set of tuples (01,62), (01,63), (02,61)} and an instance for DBO 



Figure 1(b) shows an instance of DBO. The value of S(v) is indicated in- 
side each vertex v. Suppose that 7(01) = 3 and 7(61) = 7(62) = 7(^3) = 2 . 
In this case, any strategy that starts evaluating i5(oi) will return the evalua- 
tion {oi, &i, & 2 , ^ 3 }, of cost 9. However, the evaluation of minimum cost for this 
instance is {&i, 62 , 63 }, of cost 6 . This example highlights the key point: the prob- 
lem is to devise a strategy for dynamically choosing, based on the function 7 
and the values of S already revealed, the next vertex v whose (5-value should be 
evaluated, so as to minimize the final, overall cost. 

Let A be an algorithm for DMO and let I = (G, 6 , 7 ) be an instance to DMO. 
We will denote the evaluation computed by A on input I by A{I) . Establishing 
a measure for the performance of a given algorithm A for DMO is somewhat 
delicate: for example, a worst case analysis of ^{A{X)) is not suitable since any 
correct algorithm should output an evaluation comprising all vertices in V (G) 
when 5 {v) = 1 for every v G V{G) (if G has no isolated vertices). This remark 
motivates the following definition. 

Given an instance X = (G, i5, 7 ), let E be an evaluation for X and let 7 *(T) 
denote the cost of a minimum cost evaluation for X. We define the deficiency 
of evaluation E (with respect to X) as the ratio d{E,X) = 'y{E)/'j*(X). Given an 
algorithm A for DMO, we define the deficiency of A as the worst case deficiency 
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of the evaluation A{X) , where X ranges over all possible instances of the problem, 
that is, d{A) = maxxd(-4(I),I). 

If ^ is a randomized algorithm, d{A{X),X) is a random variable, and the 
expected deficiency of A is then defined as the maximum over all instances of 
the mean of this random variable, that is, 

d{A) = m^xE \d{A{X) ,X)] = mjixE [7(^(1))] /^*{X). 

Clearly, we wish to devise fast algorithms whose (expected) deficiency is as 
close to 1 as possible. In this paper, we are concerned with designing algorithms 
for DMO, analyzing them and establishing bounds for their deficiency. 

1.2 Statement of Results 

In Sect. 2 we start by giving lower bounds on the deficiency of deterministic 
and randomized algorithms for DMO (see Theorem 1). It is worth noting that 
these lower bounds apply even if we allow exponential time algorithms. We 
then present an optimal deterministic algorithm for DMO with time complex- 
ity 0(|if(G) I log r(G)), developed with the primal-dual approach. As an aside, 
we remark that this algorithm does not need to know the whole hypergraph in 
advance in order to solve the problem, since it scans the edges (tuples), evalu- 
ating each of them as soon as they become available. This is a most convenient 
feature for the database application that motivates this work. We also note that 
Feder et al. [4] independently obtained similar results. 

In Sect. 3, for any given 0 < £ < I — -\/2/2, we present a randomized, poly- 
nomial time algorithm TZ^ for DBO whose expected deficiency is at most 2 — £. 
The best expected deficiency is achieved when £ = 1 — -\/2/2. However, the 
smaller the value of £, the smaller is the probability that a particular execu- 
tion of TZs will return a truly poor result: we show that the probability that 
d{TZg{X),X) <1-1- 1/(1 — e) holds is 1. 

The deficiency of TZg is not assured to be highly concentrated around the 
expectation. In Sect. 3.1, we show that this limitation is inherent to the problem, 
rather than a weakness of our approach: for any 0 < £ < 1, no randomized 
algorithm can have deficiency smaller than 1 -I- £ with probability larger than 
(1 -I- £)/2. The proof of this fact makes use of Yao’s Minimax Principle [9]. 

The reader is referred to the full version of this extended abstract for the 
proofs of the results (or else [6]). 

1.3 Related Work 

The problem of optimizing queries with expensive predicates has gained some 
attention in the database community [1,3, 5, 7, 8]. However, most of the proposed 
approaches [1,3,5] do not take into account the fact that an attribute value may 
appear in different tuples in order to decide how to execute the query. In this 
sense, they do not view the input relation as a general hypergraph, but as a 
set of tuples without any relation among them (i.e., as a matching hypergraph). 
The Predicate Migration algorithm proposed in [5], the main reference in this 
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subject, may be viewed as an optimal algorithm for a variant of DMO, in which 
the input graph is always a matching, the probability pi of a vertex from Vj 
(ith attribute) evaluating to true (<5(u) = 1) is known, and the objective is to 
minimize the expected cost of the computed evaluation (we omit the details). 

The idea of processing the hypergraph induced by the input relation appears 
first in [8], where a greedy algorithm is proposed with no theoretical analysis. 
The distributed case of DBO, in which there are two available processors, say Pa 
and Pb, responsible for evaluating 6 on the nodes of the vertex classes A and B 
of the input bipartite graphs is studied in [7]. The following results are presented 
in [7]: a lower bound of 3/2 on the deficiency of any randomized algorithm, a 
randomized polynomial time algorithm of expected deficiency 8/3, and a linear 
time algorithm of deficiency 2 for the particular case of DBO with constant 7. 
We observe that the approach here allows one to improve some of these results. 

In this extended abstract, we restrict our attention to conjunctive queries (in 
the sense of (1)). However, much more general queries could be considered. For 
example, S: E{G) — >■ {0, 1} could be any formula in the first order propositional 
calculus involving the predicates represented by 5. In [2], Charikar et al. consid- 
ered the problem of querying priced information. In particular, they considered 
the problem of evaluating a query that can be represented by an “and/or tree” 
over a set of variables, where the cost of probing each variable may be different. 
The framework for querying priced information proposed in that paper can be 
viewed as a restricted version of the problem described in this paragraph, where 
the input hypergraph has one single edge. It would be interesting to investigate 
DMO with such generalized queries. 

1.4 Preliminaries 

Let X = (G,S,j) be an instance to DMO. The neighbourhood of u G V{G) is 
the set P{v) = {rt G V{G) — {w}: {u,w} C e for some e G E{G)}. For any X C 
V{G), we let Vo{X) = {v £ X: S{v) = 0}, Vi{X) = {v £ X: S{v) = 1}, 
E{X) = U„ex r{v) and P^{X) = r{V^{X)). 

A cover for G is a set C C V{G) such that every edge of G has at least 
one vertex in G. A minimum cover for (G, 7) is a cover G for G such that 7(G) 
is minimal. Observe that any evaluation for X must contain a cover for G as a 
subset, otherwise the 5-value of at least one edge cannot be determined. 

Let us now restrict our attention to DBO, the restricted case of DMO where 
G is a bipartite graph. Let X = (G, S, 7) be an instance to DBO. For a cover G 
for G, we call E(G) = G U A(G) the G -evaluation for X. It is not difficult to 
see that a G-evaluation for X is indeed an evaluation for X. Moreover, since 
any evaluation for (G, 5) must contain some cover for G and Pi{V{G)), it is 
not difficult to conclude that the deficiency of a G-evaluation for an instance to 
DBO has deficiency at most 2, whenever G is a minimum cover for (G,y). This 
observation appears in [7] for the distributed version of DBO. 

An optimal cover G for (G, 7), and as a consequence E(G), may be computed 
in polynomial time if G is a bipartite graph. We use COVER to denote an 
algorithm that outputs E(G) for some minimum cover G. Since 2 is a lower 
bound for the deficiency of any deterministic algorithm for DBO (see Sect. 2), 
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we have that COVER is a polynomial time, optimal deterministic algorithm 
for DBO. This algorithm plays an important role in the design of the randomized 
algorithm proposed in Sect. 3. 

2 An Optimal Polynomial Deterministic Algorithm 

We start with some lower bounds for the deficiency of algorithms for DMO. 
It is worth noting that these hounds apply even to algorithms of exponential 
time/space complexity. 

Theorem 1. (i) For any given deterministic algorithm A for DMO and any 
hypergraph G with at least one edge, there exist functions 7 and S such that 
d{A{G,5,-f))>r{G). 

(ii) For any given randomized algorithm B for DMO and any hypergraph G 
with at least one edge, there exist functions 7 and S such that d{B{G,5,^)) > 
(r(G) + l)/2. 

2.1 An Optimal Polynomial Deterministic Algorithm for DMO 

We will now introduce a polynomial time, deterministic algorithm for DMO that 
has deficiency at most r{G) on an instance X = (G, 5, 7 ). In view of Theorem 1, 
this algorithm has the best possible deficiency for a deterministic algorithm. 

Let (G, 6, 7 ) be a fixed instance to DMO, and let Ej = {e G E{G ) : S(e) = i} 
and IT* = Uegij, e (zG{ 0 , 1 }). 

We let G[Ei] be the hypergraph with vertex set Wi and edge set Ei. Let 7 q 
be the cost of a minimum cover for {G[Eo],j), among all covers for (G[Eo], 7 ) 
that contain vertices in Vq = V(i{V{G)) = {u G V{G)\ 5{v) = 0} only. Then 
7*(G,5,7) = tS +7(W^i)- 

Let us look at 7 g as the optimal solution of the following Integer Program- 
ming problem, which we will denote by L/(G, < 5 , 7 ): 

min < 7 (v)a;^ : Xy>l for all e G Eq and Xy G {0, 1} for all w G Vg 

^ iiGePlVb 



Let us denote by L{G,S,j) the linear relaxation of Lj{G,S,j), where the 
restrictions Xy G {0, 1} are replaced by > 0 for all u G Vg- The dual L{G, 6, 7 )-° 
of L{G, 5, 7 ) is 

max E Ue < 7(^') for all u G Vg £^nd ?/e > 0 for all e G Eg 

^ s^Eq e: vGe 



Lemma 2. Let (G,S,j) be an instance to DMO and lety \ Eq he a feasible 
solution of L{G,6,j)^ . Any evaluation E of{G,S) satisfying 

l{v) < E i/e for all v G E - Wi, ( 2 ) 

e : v^e 



has deficiency at most r{G). 
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The algorithm presented below uses a primal-dual approach to construct a 
vector y: i? — >■ R and an evaluation E such that both the restriction of y to Eq 
and E satisfy the the conditions of Theorem 2. 

Our algorithm maintains for each e G E(G) a value and for every v G V (G) 
the value r„ = ^ggj/e- At each step, the algorithm selects an unevaluated 
edge e and increases the corresponding dual variable ye until it “saturates” the 
next non-evaluated vertex v (r„ becomes equal to 'j(v)). The values of r„ (u G e) 
are updated and the vertex v is then evaluated. If S{v) = 0, then the edge e 
is added to Eq along with all other edges that contain v, and the algorithm 
proceeds to the next edge. Otherwise the algorithm increases the value of the 
dual variable j/g until it “saturates” another unevaluated vertex in e and executes 
the same steps until either e is put into Eq or there are no more unevaluated 
vertices in e, in which case e is put in Ei. 



Algorithm VT>{G, 5, 7 ) 

1. Start with Eq, Ei and E as empty sets, r„ = 0 for all v G V(G) and j/g = 0 
for all e G E(G) 

2. While E(G) ^ Ei U Eq 

a) Select an edge e G E(G) — (Ei U Eq) 

b) While e g E and e 0 £^0 

i. select a vertex v G e — E such that 7 (w) — is minimum 

ii. add j(v) — r„ to j/e and to each r„ such that u G e 

iii. insert u in E 

iv. If 5(u) = 0, insert in Eq every edge e' G E{G) such that v G e' 

c) If e g £0, insert e in £1 

3. Return E 



Lemma 3. Let (G,S,^) be an instance to DMO. At the end of the execution 
of 'P'D{G, 6,j) , the restriction of y to Eq is a feasible solution to £(G, (5, 7 )'° 
and E is an evaluation of {G, 6) satisfying (2). Algorithm VD{G, S,j) runs in 
time 0(|£(G) I log r(G)). 

Theorem 4. Algorithm VD is a polynomial time, optimal deterministic algo- 
rithm for DMO. 

3 The Bipartite Case and a Randomized Algorithm 

Let 0 < e < 1 — \f2j2. In this section, we present ??.£, a polynomial time ran- 
domized algorithm for DBO with the following properties: for every instance X, 
we have 



and 



E[d(7^e(T))] < 2-£ 



(3) 



d(7^e(T)) < 1 + 



1 — 6 



= 1 . 



(4) 
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Thus, TZe provides a trade-off between expected deficiency and worst case defi- 
ciency. At one extreme, when £ = 1 — \/ 2 / 2 , we have expected deficiency 1.707 . . . 
and worst case deficiency up to 2.41 for some particular execution. At the other 
extreme {e = 0 ), we have a deterministic algorithm with deficiency 2 . 

The key idea in TZ^’s design is to try to understand under which conditions 
the COVER algorithm described in Sect. 1.4 does not perform well. More exactly, 
given an instance I to DBO, a minimum cover C for (G,<5), and £ > 0, we turn 
our attention to the instances I having d{E{C),I) > 2 — e. 

One family of such instances can be constructed as follows. Consider an 
instance (G,<5, 7 ) to DBO where G is a matching of n edges, the vertex classes 
of G are A and B, 6{v) = 1 for every v G A and S{v) = 0 for every v G B and 
■j{v) = 1 for every v G V{G). Clearly, B is an optimum evaluation for I, with 
cost n. On the other hand, note that the deficiency of the evaluation E(G) which 
is output by COVER depends on which of the 2" minimum covers of G is chosen 
for G. In the particular case in which G = A, we have d(E(G),I) = 2n/n = 2. 

This example suggests the following idea. If G is a minimum cover for (G, 7 ) 
and nonetheless E(G) is not a “good evaluation” for 1 = (G,(5, 7 ), then there 
must be another cover G' of G whose intersection with G is “small” and still G' 
is not “far from being” a minimum cover for G. The following lemma formalizes 
this idea. 

Lemma 5. Let I = (G, <5, 7 ) be an instance to DBO, let C he a minimum cover 
for (G, S) and let 0 < e < 1. If d(E(G)) > 2 — £, then there is a vertex cover 
for G such that 7 (Ge) < ( 7 (G — Gg))/(1 — £). 

Let I = (G,5, 7 ), G and £ be as in the statement of Lemma 5. Let G' be 
a minimum cover for (G, 7 c,e)j where 7 c, e is given by 'jc,siv) = (1 — £) 7 (v) if 
V ^ C and 'yc,e{v) = (2 — otherwise. 

We can formulate the problem of finding a cover satisfying 7 (Ge) < 7 (G — 
Gg)/(1 — £) as a linear program in order to conclude that such a cover exists 
if and only if 7 ce(C") < 7 (G). Furthermore, if jce(G') < 7 (G) then 7 (G') < 
7(G-G')/(1-£). 

This last remark, together with Lemma 5, provides an efficient way to verify 
whether or not a particular minimum cover G is going to give a good evaluation 
for (G,( 5 , 7 ). 

The cover G' above can be computed in polynomial time in those cases where 
G is bipartite, we can devise the following randomized algorithm for DBO. 



Algorithm TZe{G,5,~l) 

1. C a minimum cover for (G, 7 ) 

2. C' a minimum cover for (G, 7c,e) 

3. If 7 c,e(G') > 7 (G), then return E(G) 

4. Let p = (1 — 3e -f e^)/(l — e) 

5. Pick X G [0, 1] uniformly at random. Return E(G) if a: < p and E(G') otherwise 



The correctness of algorithm TZg follows from the fact that TZ^ always out- 
puts a cover evaluation (see Sect. 1.4). Properties (3) and (4) of the evaluation 
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computed by TZ^, claimed at the beginning of Sect. 3, are assured by the next 
result. 

Theorem 6. Let 0 < e < 1 — \f2j2. For any instance X = (G,<5, 7 ) we have 
E[d(7^e(I))] < 2-£ andP(d(7^e(2:)) < (2-£)/(l -e)) = 1. 

Theorem 6 is tight when £ = 1 — -\/2/2. Indeed, consider the instance X = 
(G, (5, 7 ), where G is a complete bipartite graph with bipartition {T, B}, where 
\B\ = 1.41|T| « -\/2|£l|, (5(a) = 0 for every a G A, S{b) = 1 for every b G B, and 
7 (w) = 1 for every v G V (G). Clearly, A is an evaluation of cost |A| since it only 
checks the vertices in A. The set B, however, is a minimum cover for (G,"fc,e) 
and ^c,e{B) < 7 ( 4 l). Hence, TZe{I) returns E(H) with probability 1/2 and E(H) 
with probability 1 / 2 , so that the expected deficiency is close to 1 + \/2j2. 

3.1 Lower Bound for Randomized Algorithms 

We have proved so far that algorithm ??.£, for £ = 1 — -\/2/2, has expected de- 
ficiency <1-1- -\/2/2 = 1.707 .... However, TZ^ does not achieve this deficiency 
with high probability. For the instance described above, TZ^ attains deficiency 
2.41 with probability 1/2 and deficiency 1 with probability 1/2. One can specu- 
late whether a more dynamic algorithm would not have smaller (closer to 1.5) 
deficiency with high probability. In this section, we prove that this is not pos- 
sible, that is, no randomized algorithm for DBO can have deficiency smaller 
than n for any given 1 < /x < 2 with probability close to 1 (see Theorem 8 ). We 
shall prove this considering instances X = with G a balanced, complete 

bipartite graph on n vertices and with 7=1 only. All instances in this section 
are assumed to be of this form. 

Let A be a randomized algorithm for DBO and let 1/2 < A < 1. Given an 
instance X = (G, i5, 7 ) where |H(G)| = n, let P{A,X, Xn) = P( 7 (A(I)) > An) 
and let P{A,Xn) = maxx H(A,I, An). Given a deterministic algorithm B and 
an instance X for DBO, we define the payoff of B with respect to X as g{B,X) = 1 
if 'y{B{X)) > Xn and g{B,X) = 0 otherwise. 

One may deduce from Yao’s minimax principle [9] that, for any randomized 
algorithm A, we have maxxE [g(A,I)] > maxpE [(/(opt, 2 p)] , where opt is an 
optimal deterministic algorithm, in the average case sense, for the probability 
distribution p over the set of possible instances for DBO. (In the inequality 
above, the expectation is taken with respect to the coin flips of A on the left- 
hand side and with respect to p on the right-hand side; we write Xp for an 
instance generated according to p.) 

Since a randomized algorithm can be viewed as a distribution probability over 
the set of deterministic algorithms, we have E [g{A,X)\ = P{A,X, Xn) and hence 
maxxE [g(A,I)] = P{A, Xn). Moreover, E [g{opt,Xp)\ is the probability that the 
cost of the evaluation computed by the optimal algorithm for the distribution p is 
at least An. Thus, if we are able to define a probability distribution p over the set 
of possible instances and analyze the optimal algorithm for such a distribution, 
we obtain a lower bound for P{A, Xn). 

Let n be an even positive integer and let G be a complete bipartite graph 
with V{G) = {1, . . . , n}. Let the vertex classes of G be {1, ... , n/2} and {n/2 -|- 
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Let 7 (u) = 1 for all v G V{G). For 1 < z < n, define the function 
5i'. V{G) — >■ {0, 1} putting = 1 if i = v and 6i{v) = 0 otherwise. Consider 
the probability distribution p where the only instances with positive probability 
are 2i = (G, (5^,7) (1 < i < n) and all these instances are equiprobable, with 
probability 1/n each. A key property of these instances is that the cost of the 
optimum evaluation for all of them is n/2, since all the vertices of the vertex class 
of the graph that does not contain the vertex with (5-value 1 must be evaluated 
in order to determine the value of all edges. We have the following lemma. 

Lemma 7. Let opt be an optimal algorithm for the distribution probability p. 
Then E [g(opt,2p)j > 1 — A. 

Since 7*(2y) = n/2 for 1 < j < n, we have the following result. 

Theorem 8. Let A be a randomized algorithm for DBO and let 1 < p <2 be a 
real number. Then there is an instance I for which P {d{A(T),I) > p) > l — p/2. 
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Abstract. We present sublinear algorithms — algorithms that use sig- 
nihcantly less resources than needed to store or process the entire input 
stream - for discovering representative trends in data streams in the form 
of periodicities. Our algorithms involve sampling 0(y/n) positions, and 
thus they scan not the entire data stream but merely a sublinear sample 
thereof. Alternately, our algorithms may be thought of as working on 
streaming inputs where each data item is seen once, but we store only a 
sublinear - 0(^/n) - size sample from which we can identify periodicities. 

In this work we present a variety of definitions of periodicities of a given 
stream, present sublinear sampling algorithms for discovering them, and 
prove that the algorithms meet our specifications and guarantees. No 
previously known results can provide such guarantees for finding any 
such periodic trends. We also investigate the relationships between these 
different dehnitions of periodicity. 

1 Introduction 

There is an abundance of time series data today collected by a varying and 
ever-increasing set of applications. For example, telecommunications companies 
collect traffic information-number of calls, number of dropped calls, number of 
bytes sent, number of connections etc. at each of their network links at small, 
say 5-minute, intervals. Such data is used for business decisions, forecasting, 
sizing, etc. based on trend analysis. Similarly time-series data is crucially used 
in decision support systems in many arenas including finance, weather prediction, 
etc. 

There is a large body of work in time series data management, mainly on in- 
dexing, similarity searching, and mining of time series data to find various events 
and patterns. In this work, we are motivated by applications where the data is 
critically used for “trend analysis”. We study a specific representative trend of 
time series, namely, periodicity. No real life time series is exactly periodic; i.e., 
repetition of a single pattern over and over again does not occur. For example, 
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the number of bytes sent over an IP link in a network is almost surely not a 
perfect repeat of a daily, weekly or a monthly trend. However, many time series 
data are likely to be ’’approximately” periodic. 

The main objective of this paper is to determine if a time series data stream 
is approximately periodic. The area of Signal Analysis in Applied Mathematics 
largely deals with finding various periodic components of a time series data 
stream. A significant body of work exists on stochastic or statistical time series 
trend analysis about predicting future values and outlier detection that grapples 
with the almost periodic properties of time series data. 

In this paper we take a novel approach based on combinatorial pattern match- 
ing and random sampling to defining approximate periodicity and discovering 
approximate periodic behavior of time series data streams. The period of a 
data sequence is defined in terms of its self-similarity; this can be either in 
terms of the distance between the sequence and an appropriately shifted ver- 
sion of itself, or else in terms of the distance between different portions of the 
sequence. Motivated by these, our approach involves the following. We define 
several notions of self-distance for the input data streams for capturing the var- 
ious combinatorial notions of approximate periodicity. Data streams with small 
self-distances are deemed to be approximately periodic; given time series data 
stream A = S'[l] • • • S'[n], we may define its self-distance (with respect to a candi- 
date periodp) d{S[jp+l : {j + l)p], S[ip+l : (i-l-l)p]), for some suitable 

distance function d(., .) that captures the similarity between a pair of segments. 
We may now consider the time series data to be approximately periodic if the 
distance is below a certain threshold. 

In this paper, we study algorithmic problems in discovering combinatorial 
periodic trends in time series data. Our main contributions are as follows. 

1. We formulate different self-distances for defining approximately periodicity 
for time series data streams. Approximate periodicity in this sense will also 
indicate that only a small number of entries of the data set need to be 
changed to make it exactly periodic. 

2. We present sublinear algorithms for determining if the input data stream 
is approximately periodic. In fact, our algorithms rely only on sampling a 
sublinear — 0{y/n ) — number of positions in the input. 

A technical aspect of our approach is that we keep a small pool of random 
samples, even if we do not know in advance what the period might be. We 
show that there is always a subsample of this pool sufficient to compute the 
self-distance under any potential period. In this sense, we “recycle” the random 
samples for one approximate period to perform computations for other periods. 
For two notions of periodicity we define here, our methods are quite simple; for 
the third notion, the sampling (in Section 3.1) is more involved with two stages 
where the second stage depends on the first. 

Related Work. Algorithmic literature on time series data analysis mostly focuses 
on indexing and searching problems, based on various distance measures amongst 
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multiple time series data. Common distance measures are Lp norms, hierarchical 
distances motivated by wavelets, etc.^ 

Although most available papers do not consider the combinatorial periodicity 
notions we explore here, one relevant paper [6] aims to find “average period” of a 
given time series data in a combinatorial fashion. This paper describes O(nlogn) 
space algorithms to estimate average periods by using sketches. 

Our work here deviates from that in [6] in a number of ways. First, we 
present the first known o(n), in fact, 0{y/n ■ polylog n) space algorithm for 
periodic trend analysis in contrast to the uj{n) space methods in [6]. We do not 
know of a way to employ sketches to design algorithms with our guarantees. 
Sampling seems to be ideal for us here: with a small number of samples we are 
able to perform computations for multiple period lengths. Second, we consider 
more general periodic trends than those in [6]. 

Sampling algorithms are known for computing Fourier coefficients with sub- 
linear space [2]. However this algorithm is quite complex and expensive, using 
samples for finding B significant periodic components - the 0(1) 
factor is rather large. In general, there is a rich theory of sampling in time series 
data analysis [10,9]; our work is interesting in the way that it recycles random 
samples among multiple computations, and adds to this growing knowledge. Our 
methods are more akin to sublinear methods for property testing; see [4] for an 
overview. In particular, in parallel with this work and independent of it, authors 
in [1] present sublinear sampling methods for testing whether the edit distance 
between two strings is at least linear or at most n“ for a < 1 by obtaining 
a directed sample set where the queries are at times evenly spaced within the 
strings. 

2 Notions of Approximate Periodicity 

Our definitions of approximate periodicity are based on the notion of exact pe- 
riodicity from combinatorial pattern matching. We will first review that notion 
before presenting our main results. 

Let S denote a time series data stream where each entry S'[i] is from a 
constant size alphabet a. We denote hy S[i : j] the segment of S between the tth 
and the jth entries (inclusive). The exact periodicity of a data stream S with 
respect to a period of size p can be described in two alternative but equivalent 
ways as follows. 

Definition 1. We say that a data stream S of size n is exactly p-periodic if 
either 

a. its size p suffix and size p prefix are identical; i.e., S')! : n — p] = S[p-|- 1 : n], 
or alternatively, 

b. S consists of repetitions of the same block B of size p; i.e. S = B^B' where 
B G a'P , B' is a prefix of B and k = \n/p\. 

^ A survey is in the tutorial offered at KDD 2000 [7]; see also [8]. 
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When examining p-periodicity of a data stream S, we denote by , the tth 
block of S of size p, that is, l)p+l : (t— l)p]. Notice that S = h\, b^, - ■ ■ b^,b' 
where k = [n/p\ and b' is the length n — kp suffix of S. When the choice of p is 
clear from the context, we drop it; i.e. we write S = bi,b 2 , ■ ■ - bk,b' . For simplicity, 
unless otherwise noted, we assume that the stream consists of a whole number of 
blocks, i.e., n = kp for some fc > 0, for any p under consideration. Any unfinished 
block at the end of the stream can be extended with don’t care symbols until 
the desired format is obtained. 



2.1 Self Distances and Approximate Periodicity 

The above definitions of exact periodicity can be relaxed into a notion of ap- 
proximate periodicity as follows. Intuitively, a data stream S can be considered 
approximately periodic if it can be made exactly periodic by changing a small 
number of its entries. To formally define approximate periodicity, we present the 
notion of a “self-distance” for a data stream. We will call a stream S approxi- 
mately periodic if its self-distance is “small” . 

In what follows we introduce three self-distance measures {shiftwise, blockwise 
and pairwise distances, denoted respectively as and F^) each of which 

is defined with respect to a “base” distance between two streams. We will first 
focus on the Hamming distance h{., .) as the base distance for all three measures 
and subsequently discuss how to generalize our methods to other base distances. 



Shiftwise Self Distance. We first relax Definition [a] of exact periodicity to 
obtain what we call the shiftwise self-distance of a data stream. As a preliminary 
step we define a simple notion of self-distance that we call the single-shift self- 
distance as follows. 

Definition 2. The single-shift self-distance of a data stream S with respect to 
period size p is DS'p{S) = h{S[p-\- 1 : n],S'[l : n —p\). 

If one assumes for the sake of simplicity that n = kp, then it is possible to 
write S = 6^6^ ... &^, and alternatively define the single-shift self-distance of S 
as DS^{S) = YliZi h{b^ , b^_^_^) . Note that S is exactly p-periodic if and only if 
DSP{S) = 0. 

Unfortunately the single-shift self-distance of S fails to provide a satisfactory 
basis for approximate periodicity. A small DSp{S) does not necessarily imply 
that S can be made exactly p-periodic by changing a small number of its entries: 
Let p = 1 and S = 00000000001111111111. It is easy to see that DS^{S) = 1. 
However, to make S periodic with p = 1 (in fact with any p) one needs to change 
a linear number of entries of S. 

Even though S is “self similar” under DS^{), it is clearly far from being 
exactly periodic as stipulated in Definition 1. Thus while Definition 1 (a) and 
(b) are equivalent in the context of exact periodicity, their simple relaxations for 
approximate periodicity can be quite different. 
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It is possible to generalize the notion of single-shift self-distance of S towards 
a more robust measure of self-similarity. Observe that if a data stream S is 
exactly p-periodic, it is also exactly 2p~, 3p~, ... periodic; i.e., if DSp{S) = 0, 
then DS'^P(S) = DS^p{S) = ... = 0. However, when DS^{S) = £ > 0 one 
can not say much about DS‘^p{S), DS^p{S), . . . in relation to £. In fact, given 
S and p, DS''^{S) can grow linearly with i: observe in the example above that 
DS^{S) = 1, DS^{S) = 2,... DS^S) =i... DS’^/^{S) = n/2. A more robust 
notion of shiftwise self-distance can thus consider the self-distance of S w.r.t. all 
multiples of p as follows. 

Definition 3. The shiftwise self-distance of a given data stream S of length n 
with respect to p is defined as 

DP{S) = max h{S[jp 1 : n], S[1 : n — jp\). 

j = l,...n/p 

In the subsequent sections we show that the shiftwise self-distance can be 
used to relax both definitions of exact periodicity up to a constant factor. 



Blockwise Self Distance. Shiftwise self-distance is based on Definition [a] of 
exact periodicity. We now define a self-distance based on the alternative defini- 
tion, which relates to the “average trend” of a data stream S' G ct” ([6]) defined 
in terms of a “representative” block of S. More specifically, given block b£ 
of S, we consider the distance of the given string from one which consists only 
of repetitions of hj. Define Ej{S) = Based on this the notion of 

average trend, our alternative measure of self-distance for S (also used in [6]) is 
obtained as follows. 

Definition 4. The blockwise self-distance of a data stream S of length n w.r.t. 
p is defined as EP{S) = mhiiEf{S). 

Blockwise self-distance is closely related to the shiftwise self-distance as will 
be shown in the following sections. 



Pairwise Self-Distance. We finally present our third definition of self-distance, 
which, for a given p, is based on comparing all pairs of size p blocks. We call this 
distance the pairwise self-distance and define it as follows. 

Definition 5. Let S consist of k blocks b^, . . . , b^, each of size p. The pairwise 
self-distance of S with respect to p and discrepancy d is defined as 

F^S,6)= ^m,b,) : h{h,b,) > 6p}\. 

Observe that FP{S,e) is the ratio of “dissimilar” block pairs to all possible 
block pairs and thus is a natural measure of self-distance. A pairwise self-distance 
of e reflects an accurate measure of the number of entries that need to be changed 
to make S exactly p-periodic up to an additive factor of 0((e-|- S)n) and thus is 
closely related to the other two self-distances. 
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3 Sublinear Algorithms for Measuring Self-Distances and 
Approximate Periodicity 

A data stream S is thought of as being approximately p-periodic if its self- 
distance {D^{S), EP{S) or FP{S, S)) is below some threshold e. Below, we present 
sublinear algorithms for testing whether a given data stream S is approximately 
periodic under each of the three self-distance measures. We also demonstrate 
that all the three definitions of approximate periodicity are closely related and 
can be used to estimate the minimum number of entries that must be changed 
to make a data stream exactly periodic. 

We first define approximate periodicity under the three self-distance mea- 
sures. 

Definition 6. A data stream S € is e- approximately p-periodic with respect 
to DP (resp. EP and EP) if DP{S) < en (resp. EP{S) < en and EP{S,S) < en) 
for some p < n/2. 



3.1 Checking Approximate Periodicity under 

We now show how to check whether S is e-approximately p-periodic for a fixed 
p < nl2 under DP. We generalize this to finding the smallest p for which S 
is e-approximately p-periodic following the discussion on the other similarity 
measures. 

We remind the reader that as typical of probabilistic tests, our method dis- 
tinguishes self-distances of over en from those below e'n. In our case, e' = ce for 
some small constant 0 < c < 1 which results from using probabilistic bounds.^ 
The behavior of our method is not guaranteed when the self-distance is between 
en and e'n. 

We first observe that to estimate DSp{S) within a constant factor, it suffices 
to use a constant number of samples from S. More precisely. Given S' G u” 
and p < n/2, one can determine whether DSp{S) < en or DSp{S) > e'n with 
constant probability using 0(1) random samples from S. This is because, all we 
need to do is to estimate whether h{S[p+l : n] , S[1 : n — p]) below e'n or above 
en. A simple application of Chernoff bounds shows us that comparing a constant 
number of sample pairs of the form (S[z], S[z-|-p]) is sufficient to obtain a correct 
answer with constant probability. 

Recall that to test whether S is e-approximately p-periodic, we need to com- 
pute each DS'P{S) separately for ip < n/2. When p is small, there are a linear 
number of such distances that we need to compute. If we choose to compute each 

^ Depending on e, one has an amount of freedom in choosing c; for instance, c = 1/2 
can be achieved through an application of Chernoff ’s or even Markov’s inequality and 
the confidence obtained can be boosted through increasing the number of samples 
logarithmically in the confidence parameter. This will hold for the rest of this paper 
as well, and we will use e and e without mentioning their exact relationship with 
this implicit understanding. 
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one separately, with different random samples (with the addition of a polyloga- 
rithmic factor for guaranteeing correctness for each period tested) this translates 
into a superlinear number of samples. To economize on the number of samples 
from S, we show how to “recycle” a sublinear pool of samples. This is viable as 
our analysis does not require the samples to be determined independently. 

Note that the definition of approximate periodicity w.r.t. leads to the 
following property analogous to that of exact periodicity. 

Property f. If S is e-approximately p-periodic under then it is e- 

approximately ip-periodic for all i < n/2p. 

Our ultimate goal thus is to find the smallest p for which S is e-approximately 
p-periodic. We now explore how many samples are needed to estimate DS^{S) 
in the above sense for all p = 1, 2, • • • n/2, which is sufficient for achieving our 
goal. 

In order to estimate DS^{S) for a specific p we need to compare 0(1) sample 
pairs of the form (S'[t],S'[i -l-p]). We now would like to determine the number 
of samples S'[i] required to guarantee that a sufficient number of sample pairs 
(5'[t], S'[z-|-p]) will be available for each n/2 > p > 1. The following lemma states 
that a pool of 0{^/n ■ poly log n) samples suffices. 

Lemma 1. A uniformly random sample pool of size 0{y/n ■ polylog n) from S 
guarantees that 0(1) sample pairs of the form (S'[z],5'[z -l-p]) are available for 
every 1 < p < n/2 with high probability. 

Proof. For any given p, one can use the birthday paradox using 0{y/n) samples 
to show that availability of 0(1) sample pairs of the form (5'[z],S'[z -l-p]) with 
constant probability, say 1 — p. For all possible values of p, the probability that 
at least one of them will not provide enough samples is at most 1 — (1 — 
Repeating the sampling O (poly log n) times, this failure probability can be re- 
duced to any desired 1/poly n. □ 

The lemma above demonstrates that by using 0(-yn- poly log n) independent 
random samples from S one can test whether S is e-approximately p-periodic 
for any p. The below theorem follows. 

Theorem 1. It is possible to test whether a given S € is e-approximately p- 
periodic or is not e' -approximately p-periodic under by using O{y/n-polylog n) 
samples and space with high probability. 



3.2 Checking Approximate Periodicity under 

Even though the blockwise self-distance E'p{S) seems to be quite different from 
shiftwise self-distance D'p{S), we show that the two measures are closely related. 
In fact we show that DP{S) and EP{S) are within a factor of 2 of each other: 



Theorem 2. Given S & <j'^ and p < n/2, EP{S)/2 < DP{S) < 2EP{S). 
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Proof. We first show the upper bound. 

Let bi = bf be the representative trend of S (of size p), that is, i = 
argmini<j<k^i^ih{be,bj). By definition, DP{S) = maxi<j<fc D5 '-^p(S') = 
maxj Y.e=i h+j)- 

By the triangular inequality, D^{S) < h{bi,bi) + 

Y.'eZi Hbi,be+j)] < ma,Xj[Y,i^ih{be,bi) + Y.i=iHbi,bi)]- Since h is sym- 
metric, this is at most h{bi, bi), which is exactly 2 • EP{S). 

For the lower bound, note that E^{S) < ^ Sfci bj). 

But DP{S) > I E ■=! Eti b(bi+„b,) > i EU b,) > EP(S)/2. 

□ 

As Theorem 2 implies, the two notions of self-distance (under Hamming 
measure) are equivalent up to a factor of 2. We have shown how to test whether 
the shiftwise self-distance of S, DP{S) is no more than some en for any given 
p by using only a sublinear {0{^/n ■ polylog n)) number of samples from S and 
similar space. The above lemma implies that this is also doable for EP{S); i.e. 
one can test whether the blockwise self-distance of S is no more than some en 
for any given p by using 0{^/ri ■ poly log n) samples from S and similar space. 

The method presented in [6] can also perform this test by first constructing 
from S a superlinear (0(fcn log n)) size pool of “sketches”; here k is the size of 
an individual sketch which depends on their confidence bound. Since this pool 
can be too large to fit in main memory, a scheme is developed to retrieve the 
pool from secondary memory in smaller chunks. In contrast, our overall memory 
requirement (and sample size) is sublinear; this comes at a price of some small 
loss of accuracy. 

Due to the fact that DP{) and EP{) are within a factor 2 of each other, they 
can be estimated in the same manner. Thus, the theorem below follows from its 
counterpart for DP, (Theorem 3), which states that approximate p-periodicity 
can be efficiently checked. 

Theorem 3. It is possible to test whether a given S € is e- approximately p- 
periodic or is not e' -approximately p-periodic under EP by using 0{^/n• polylog n) 
samples and space with high probability. 

Here the “gap” between e and e' is within factor 4 of the gap for DP{). 

Non-Hamming Measures. We showed above how to test whether a data 
stream S of size n is e-approximately p-periodic using self-distances DP() and 
EP{). We assumed that the comparison of blocks was done in terms of the Ham- 
ming distance. We now show how to use other distances of interest. 

First, consider the Li distance. Note that, since our alphabet cr is of constant 
size, the Li distance between two data streams is within a constant factor of 
their Hamming distance. More specifically, let q = \a\. Then, for any R,S G ct", 
q ■ h{R,S) > Li{R,S). Thus, the method of estimating the Hamming distance 
will satisfy the requirements of our test for Li albeit with different constant 
factors. Let D' and E' be the self-distance measures which modify the Hamming 
distance based measures of D and E by the use of Li distance. Then, for any 
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given p our estimate D'p{S) will still be within at most a constant factor of 
E'P{S). 

Now consider the L 2 distance. Again, assuming that our alphabet cr is of 
size q one can observe that, if h{R,S) = p, then ^ < L 2 {R,S) < q^/p. Thus, 
by making the necessary adjustments to the allowed distance, one can obtain 
a test with different constant factors as with the Li distance. In fact, a similar 
argument holds for any Li distance. 

Similar discussions apply for Rp as well and are hence omitted. 



3.3 Checking Approximate Periodicity under FP 

Recall that Rp is a measure of the frequency of dissimilar blocks of size p in 
S. In this section, we show how to efficiently test whether S is e-approximately 
p-periodic under Rp (for any p where p is not known a priori); we will later 
employ this technique to find all periods and the smallest period of S efficiently. 
In order to be able to estimate RP{S, S) for all p, we would like to compare pairs 
of blocks explicitly. This requires as many as polylogarithmic sample pairs within 
each pair of blocks (bi,bj) of size p that we compare. Unfortunately, our pool 
of samples from the previous section turns out to be too small to yield enough 
sample pairs of the above kind for all p - in fact, it can be seen easily that 
a sublinear uniform random sample pool will never achieve the desired sample 
distribution and the desired confidence bounds in this case. Instead, we present 
a more directed sampling scheme, which will collect a sublinear size sample pool 
and still have enough samples to perform the test for any period p. 

A Two-Phase Scheme to Obtain The Sample Pool. To achieve a sub- 
linear sample pool from S which will have enough per block samples, we obtain 
our samples in two phases. 

In the first phase we obtain a uniform sample pool from S, as in the previous 
section, of size 0{y/n ■ poly log n); these samples are called primary samples. 

In the second phase, we obtain, for each primary sample S'[z], a polyloga- 
rithmic set of secondary samples distributed identically around i (respecting the 
boundaries of S). To do this, we pick O (poly log n) offsets relative to a generic 
location i as follows. We pick O(logn) neighborhoods of size 1, 2, 4, 8, ... n 
around i.^ Neighborhood k refers to the interval : i + 2^“^ — 1]; e.g., 

neighborhood 3 (of size 8) of ^[i] is — 4 : i -|- 3]. From each neighborhood we 
pick 0(polylog n) uniform random locations and note their positions relative to 
i. Note that the choosing of offsets is performed only once for a generic z; the 
same set of offsets will later be used for all primary samples. 

To obtain the secondary samples for any primary sample S'[z], we sample 
the locations indicated by the offset set with respect to location z (as long as 
the sample location is within 5”).^ Note that the secondary samples for any two 

® Since we are only choosing offsets, we allow neighborhoods to go past the boundaries 
of S. We handle invalid locations during the actual sampling. Also, for simplicity, 
we assume n to be a power of 2. 

For easier use in the algorithm later, for each sample the size of the neighborhood 
from which it is picked is also noted. 
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primary samples S'[i] and 5'[j] are located identically with around respective 
locations i and j. 

Estimating F'p . We can now use standard techniques to decide whether 
FP{S,6) is large or small. We start by uniformly picking primary sample pairs 
S[j]) such that z — j is a multiple of p.^ Call the size p blocks containing 
5'[z] and S'[j] bk and 6;. We can now proceed to check whether h{bk,bi) is large 
by comparing these two blocks at random locations. To obtain the necessary 
samples for this comparison, we use our sample pool and the neighborhoods 
used in creating it as follows. We consider the smallest neighborhood around S'[z] 
which contains bk and use the secondary samples of S'[z] from this neighborhood 
that fall within bk ■ We then pick samples from bi in a similar way and compare 
the samples from bk and 5/ to check h{bk, h). We repeat the entire procedure for 
the next block pair until sufficient block pairs have been tested. 

To show that this scheme works, we first show that we have sufficient primary 
samples for any given p to compare enough pairs of blocks. To do this, for any p, 
we need to pick 0(polylog n) pairs of size p blocks uniformly, which is possible 
given our sample set as the following simple lemma demonstrates. 

Lemma 2. Consider all sample pairs (>S'[z], S[j]) from a set of 0{y/n- polylog n) 
primary samples uniformly picked from a data stream S of length n. Given any 
0 < p < zz/2, the following hold with high probability: 

(a) There are {poly log n) pairs (5'[z], S'[j]) that one can obtain from the 

primary samples such that i — j is a multiple of p. 

(b) Consider block pair {bi,bj) containing a sample pair (S'[z], S'[j]) as de- 
scribed in (a). {bi,bj) is uniformly distributed in the space of all block pairs of 
of S of size p.® 

Proof, (a) follows easily from Lemma 1. 

To see (b), consider two block pairs {bi,bj) and {bk,bi). There are p sample 
pairs which will induce the picking of the former pair, and the same holds for 
the latter pair. Thus, any block pair will be picked with equal probability. 

Thus, our technique allows us to have, for any p, a polylogarithmic size 
uniform sample of block pairs of size p. Now, consider the secondary samples 
within a block that we pick for comparing two blcoks as explained before. It 
is easy to see that these particular samples are uniformly distributed within 
their respective blocks, since secondary samples within any one neighborhood 
are uniformly distributed. Additionally, they are located at identical locations 
within their blocks. All we need is there to be a sufficient number of such samples, 
which we argue below. 

® There are several simple ways of doing this without violating our space bounds 
which involve time/space tradeoffs that are not immediately relevant to this paper. 
Additionally, picking the pairs without replacement makes the final analysis more 
obvious but makes the selection process slightly more complicated. 

® For simplicity we assume that p divides n; otherwise one needs to be a little careful 
during the sampling to take care of the boundaries. 
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Lemma 3. Let 5'[i] and + rp] be two primary samples. Let bi and bm be 
the blocks of size p that contain 5'[i] and S'[i + rp] respectively. Then, with the 
sampling scheme desribed above we will have picked sufficient secondary samples 
to tell whether h{bi,bm) > dp high probability. 

Proof. Consider t such that 2*“^ < p < 2*. The t + 1-neighborhood of S'[t] is of 
size at most 4p, and contains 6; . Since bi occupies at least 1/4 of this neighbor- 
hood, it is expected to contain at least a quarter of the secondary samples of S'[i] 
from this neighborhood, which will be uniformly distributed in bi. The case is 
the same for bm and the samples it contains. As a result, we have l7(polylog n) 
uniform random samples from both bi and bm, which, as we argued before, can 
be viewed as pairs of points located identically within their respective blocks. 
Then, one can test whether h{bi,bm) with high probability by comparing the 
corresponding sample pairs from each block. □ 

Combining the choice of blocks and the comparison of block pairs, we obtain 
the following theorem. 

Theorem 4. Lt is possible to test whether a given S € is e- approximately p- 
periodic or is not T -approximately p-periodic under by using O{y/n-polylog n) 
samples and space with high probability. 

Since our algorithm does not require advance knowledge of p, to find all 
periods, the smallest period, etc. under this measure, it suffices to try the test 
with different values of p without increasing the sample size, as we argue in the 
next section. 



3.4 Checking Periodicity for All Periods 

So far we have focused on testing periodicity for one period. In general we not 
have access to a hypothetical period and may want to know whether a data 
stream S is e-approximately periodic with any period, and/or what its smallest 
period p is. These can easily be determined once the particular similarity measure 
is evaluated for all possible p. Since and involve computing similarities 
for all p, for these two measures it is easy to extend the computation to all p. As 
for FP, checking for approximate periodicity for a fixed p is easy, but the trivial 
technique of picking blocks and sampling will not extend to efficiently checking 
for all p. However, our technique as described in the previous section is specially 
designed so that its sample set will work with high probability for any and every 
valid p. Thus, checking periodicity for varying periods is now possible by using 
sublinear samples. 

Theorem 5. Given S G cr", it is possible to perform any of the following tasks 
under , and F^ by using 0{^n ■ polylog n) independent random samples 

from S and similar space: 

a) to find out if S is e-approximately p-periodic, 

b) to find all periods p (and thus the smallest period) for which S is e- 
approximately p-periodic. 
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Note that if the smallest approximate period of S is determined to be p then 
we guarantee that DP{S) < e'n and there exists no j < p such that D^S) < en. 
The same holds for and as well. 



3.5 Relationships between Three Notions of Approximate 
Periodicity 

We have already demonstrated that and are equivalent up to a factor of 
2. We now demonstrate that E^ is closely related to both of these definitions of 
self-similarity in the sense that all three measures can be used to test whether a 
data stream S is almost periodic as follows. 

Definition 7. A data stream S of size n is called almost p-periodic w.r.t. 7 if 
yn of its entries can he changed to make S exactly p-periodic. 

The next lemma relates the notion of approximate periodicity under E^ (and 
thus D'P) with that of almost periodicity. 

Lemma 4. If a data stream S is e-approximate p-periodic under E’p then it is 
almost p-periodic w.r.t. 7 for e /2 < 7 < e. 

Proof. Let B = argminb^apY^^i=i^{hi,b). Clearly S is almost p-periodic if 
hibi, B) < jn. Similarly let hi = be the representative trend of S'; i.e. 
i = argmini<j<k h{bi, bj). However: 



k 

ke = ky^h{bi,bi) 

i=i 



k 

^ 2A: ^ hibi, B) 

ve Vi Vi Vi i=i 



2kj. 



The second part of the inequality is trivial. □ 

Finally one can easily verify the following relationship between approximate 
periodicity under E^ and almost periodicity. 

Lemma 5. If for a given data stream S, EP{S,5) < e then S can be made 
exactly periodic by changing 0{{6 -\- e)n) of its entries. 



4 Concluding Remarks 

We introduced new notions of time series data streams being approximately 
periodic based on significance of combinatorial scores in terms of self-distances. 
We presented the first known sublinear-0(i/n) space- algorithms for detecting 
such approximate periodicities in time series data streams based on sampling, 
and reusing these random samples for multiple potential period lengths. Besides 
such periodicities, there may be other representative trends in a data stream; 
it could be interesting to develop efficient, sublinear sampling algorithms for 
detecting such trends. 
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Abstract. We introduce a new sublinear space data structure — the Count-Min 
Sketch — for summarizing data streams. Our sketch allows fundamental queries 
in data stream summarization such as point, range, and inner product queries to 
be approximately answered very quickly; in addition, it can be applied to solve 
several important problems in data streams such as finding quantiles, frequent 
items, etc. The time and space bounds we show for using the CM sketch to solve 
these problems significantly improve those previously known — typically from 
1 /e^ to 1/e in factor. 

1 Introduction 

We consider a vector a, which is presented in an implicit, incremental fashion. This vector 
has dimension n, and its current state at time t is a{t) = [ai(f), . . . ai(f), . . . , a„(f)]. 
Initially, a is the zero vector, Oi(0) = 0 for all i. Updates to individual entries of 
the vector are presented as a stream of pairs. The tth update is (it,Ct), meaning that 
Oij(f) = ai^{t — 1) + Ct, and ap{t) = ap{t — 1) for all i' ^ it. At any time t, a query 
calls for computing certain functions of interest on a{t). 

This setup is the data stream scenario that has emerged recently. Algorithms for com- 
puting functions within the data stream context need to satisfy the following desiderata. 
First, the space used by the algorithm should be small, at most poly-logarithmic in n, the 
space required to represent a explicitly. Since the space is sublinear in data and input size, 
the data structures used by the algorithms to represent the input data stream is merely a 
summary — aka a sketch or synopsis [10]) — of it; because of this compression, almost no 
function that one needs to compute on a can be done precisely, so some approximation 
is provably needed. Second, processing an update should be fast and simple; likewise, 
answering queries of a given type should be fast and have usable accuracy guarantees. 
Typically, accuracy guarantees will be made in terms of a pair of user specified param- 
eters, £ and S, meaning that the error in answering a query is within a factor of e with 
probability S. The space and update time will consequently depend on e and S; our goal 
will be limit this dependence as much as is possible. 

Many applications that deal with massive data, such as Internet traffic analysis and 
monitoring contents of massive databases, motivate this one-pass data stream setup. 

* Supported by NSF ITR 0220280 and NSF El A 02-05116. 
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There has been a frenzy of activity recently in the Algorithm, Database and Networking 
communities on such data stream problems, with multiple surveys, tutorials, workshops 
and research papers. See [7,3,16] for detailed description of the motivations driving this 
area. 

In recent years, several different sketches have been proposed in the data stream 
context that allow a number of simple aggregation functions to be approximated. Quan- 
tities for which efficient sketches have been designed include the Li and L 2 norms of 
vectors [2], the number of distinct items in a sequence (ie number of non-zero entries in 
[8], join and self-join sizes of relations (representable as inner-products of vectors 
a{t), b{t)) [2,1], item and range sum queries [12,4]. These sketches are of interest not 
simply because they can be used to directly approximate quantities of interest, but also 
because they have been used considerably as “black box” devices in order to compute 
more sophisticated aggregates and complex quantities: quantiles [13], wavelets [12], 
and histograms [11]. Sketches thus far designed are typically linear functions of their 
input, and can be represented as projections of an underlying vector representing the data 
with certain randomly chosen projection matrices. This means that it is easy to compute 
certain functions on data that is distributed over sites, by casting them as computations 
on their sketches. So, they are suited for distributed applications too. 

While sketches have proved powerful, they have the following drawbacks. 

- Although sketches use small space, the space used typically has a l7(l/£r^) multi- 
plicative factor. This is discouraging because e = 0.1 or 0.01 is quite reasonable and 
already, this factor proves expensive in space, and consequently, often, in per-update 
processing and function computation times as well. 

- Many sketch constructions require time linear in the size of the sketch to process 
each update to the underlying data [2,13]. Sketches are typically a few kilobytes up 
to a megabyte or so, and processing this much data for every update severely limits 
the update speed. 

- Sketches are typically constructed using hash functions with strong independence 
guarantees, such as p-wise independence [2], which can be complicated to evaluate, 
particularly for a hardware implementation. One of the fundamental questions is to 
what extent such sophisticated independence properties are needed. 

- Many sketches described in the literature are good for one single, pre-specified 
aggregate computation. Given that in data stream applications one typically monitors 
multiple aggregates on the same stream, this calls for using many different types of 
sketches, which is a prohibitive overhead. 

- Known analyses of sketches hide large multiplicative constants in big-Oh notation. 

Given that the area of data streams is being motivated by extremely high performance 
monitoring applications — eg., see [7] for response time requirements for data stream 
algorithms that monitor IP packet streams — these drawbacks ultimately limit the use of 
many known data stream algorithms within suitable applications. 

We will address all these issues by proposing a new sketch construction, which we 
call the Count-Min, or CM, sketch. This sketch has the advantages that: (1) space used is 
proportional to 1/e; (2) the update time is significantly sublinear in the size of the sketch; 
(3) it requires only pairwise independent hash functions that are simple to construct; (4) 
this sketch can be used for several different queries and multiple applications; and (5) 
all the constants are made explicit and are small. Thus, for the applications we discuss. 
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our constructions strictly improve the space bounds of previous results from 1 /e^ to 1/e: 
and the time bounds from 1/er^ to 1, which is signihcant. 

Recently, a 17(1 /e^) space lower bound was shown for a number of data stream prob- 
lems: approximating frequency moments Ffc(t) = estimating the number 

of distinct items, and computing the Hamming distance between two strings [ 17] . It is an 
interesting contrast that for a number of similar seeming problems (finding Heavy Hitters 
and Quantiles in the most general data stream model) we are able to give an 0(i) upper 
bound. Conceptually, CM Sketch also represents progress since it shows that pairwise 
independent hash functions suffice for many of the fundamental data stream applica- 
tions. From a technical point of view, CM Sketch and its analyses are quite simple. We 
believe that this approach moves some of the fundamental data stream algorithms from 
the theoretical realm to the practical. Our results have some technical nuances: (1) The 
accuracy estimates for individual queries depend on the Li norm of a{t) in contrast to 
the previous works that depend on the L 2 norm. (2) Most prior sketch constructions re- 
lied on embedding into small dimensions to estimate norms. Avoiding such embeddings 
allows our construction to avoid 17(^) lower-bounds on these embeddings. 

2 Preliminaries 

We consider a vector a, which is presented in an implicit, incremental fashion. This vector 
has dimension n, and its current state at time t is a{t) = [ai(Q, . . .ai{t), . . . a„(f)]. For 
convenience, we shall usually drop t and refer only to the current state of the vector. 
Initially, a is the zero vector, 0, so ai(0) is 0 for all i. Updates to individual entries of 
the vector are presented as a stream of pairs. The tth update is {it, Ct), meaning that 

a^^{t) = ai^{t - 1) + Ct; a^>{t) = af{t - 1) i' ^ it 

In some cases, cjs will be strictly positive, meaning that entries only increase; in other 
cases, C(S are allowed to be negative also. The former is known as the cash register 
case and the latter the turnstile case [16]. There are two important variations of the 
turnstile case to consider: whether a^s may become negative, or whether the application 
generating the updates guarantees that this will never be the case. We refer to the first of 
these as the general case, and the second as the non-negative case. Many applications 
that use sketches to compute queries of interest — such as monitoring database contents, 
analyzing IP traffic seen in a network link — guarantee that counts will never be negative. 
However, the general case occurs in important scenarios too, for example in distributed 
settings where one considers the subtraction of one vector from another, say. 

At any time t, a query calls for computing certain functions of interest on a{t). We 
focus on approximating answers to three types of query based on vectors a and b. 

- A point query, denoted Q{i), is to return an approximation of at. 

- A range query Q{1, r) is to return an approximation of 

- An inner product query, denoted Q{a,b) is to approximate a 

These queries are related: a range query is a sum of point queries; both point and 
range queries are specific inner product queries. However, in terms of approximations 
to these queries, results will vary. These are the queries that are fundamental to many 
applications in data stream algorithms, and have been extensively studied. In addition, 
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they are of interest in non-data stream context. For example, in databases, the point and 
range queries are of interest in summarizing the data distribution approximately; and 
inner-product queries allow approximation of join size of relations. Fuller discussion of 
these aspects can be found in [9,16]. 

We will also study use of these queries to compute more complex functions on 
data streams. As examples, we will focus on the two following problems. Recall that 
ll«lli = E"=i |oi(f)|; more generally, ||a||p = (E”=i 

- {(j)-Quantiles) The (/)-quantiles of the cardinality ||a||i multiset of (integer) values 
each in the range 1 . . . n consist of those items with rank k(j)\ |a| 1 1 for fc = 0 . . . !/(/) 
after sorting the values. Approximation comes by accepting any integer that is be- 
tween the item with rank {k(f> — £r)||a||i and the one with rank {k(j) + £)||a||i for 
some specified e < (f>. 

- {Heavy Hitters) The </>-heavy hitters of a multiset of ||a||i (integer) values each in 

the range 1 . . . n, consist of those items whose multiplicity exceeds the fraction (p of 
the total cardinality, i.e., Ui > </)||a||i. There can be between 0 and ^ heavy hitters 
in any given sequence of items. Approximation comes by accepting any i such that 
a-i ^ ~ c)||£i||i for some specified e < (f>. 

Our goal is to solve the queries and the problems above using a sketch data structure, 
that is using space and time significantly sublinear — polylogarithmic — in input size n 
and ||a||i. All our algorithms will be approximate and probabilistic; they need two 
parameters, £ and S, meaning that the error in answering a query is within a factor of 
£ with probability S. Both these parameters will affect the space and time needed by 
our solutions. Each of these queries and problems has a rich history of work in the data 
stream area. We refer the readers to surveys [16,3], tutorials [9], as well as the general 
literature. 



3 Count-Min Sketches 

We now introduce our data structure, the Count-Min, or CM, sketch. It is named after 
the two basic operations used to answer point queries, counting first and computing the 
minimum next. We use e to denote the base of the natural logarithm function. In. 

Data Structure. A Count-Min ( CM) sketch with parameters (£, h) is represented by a two- 
dimensional array counts with width w and depth d: count[l, 1] . . . count[d, wj. Given 
parameters (£, S), set w = [j] and d = [In . Each entry of the array is initially zero. 
Additionally, d hash functions hi .. .hd : {1 . . . n} — >■ {1 . . . w} are chosen uniformly 
at random from a pairwise-independent family. 

Update Procedure. When an update {it, Ct) arrives, meaning that item is updated by 
a quantity of Ct, then Ct is added to one count in each row; the counter is determined by 
hj . Formally, set VI < j < c? : count[j, hj (it)] ^ count[j, hj (it)] + Ct The space used 
by Count-Min sketches is the array of wd counts, which takes wd words, and d hash 
functions, each of which can be stored using 2 words when using the pairwise functions 
described in [15]. 
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4 Approximate Query Answering Using CM Sketches 

For each of the three queries introduced in Section 2: Point, Range, and Inner Product 
queries, we show how they can be answered using Count-Min sketches. 

4.1 Point Query 

We first show the analysis for point queries for the non-negative case. 

Estimation Procedure. The answer to Q{i) is given by di = minj count\j, hj{i)]. 

Theorem 1. The estimate di has the following guarantees: ai < di; and, with proba- 
bility at least 1 — 5, di < a* -f e| |a||i. 

Proof. We introduce indicator variables which are 1 if (i k)A{hj{i) = hj{k)), 
and 0 otherwise. By pairwise independence of the hash functions, then 

E(Cj,fe) = Pr[hj{i) = hj{k)] < l/range(iij) = f. 

Define the variable Xij (random over the choices of hi) to be Xij = 

Since all Oi are non-negative in this case, Xij is a non-negative variable. By construc- 
tion, count[j, hj{i)] = Oi Xij. So, clearly, min counf[j, hj{i)] > a*. For the other 
direction, observe that 

/ " \ " g. 

= E I ^ ^ I ^ ^ ^ f: “llu-lll 

Vfe=i / fc=i ® 

by pairwise independence of hj , and linearity of expectation. By the Markov inequality, 

Pr[di > Oi + e||a||i] = Pr[Vy. count[j, hj{i)] > Oi e||a||i] 

= Pr[Vy. tti Xij > -I- £||a||i] 

= Pr[Vy. Xi^j > eE(X,j)] < e~<^ < 6 I 

The time to produce the estimate is 0(ln | ) since finding the minimum count can be 
done in linear time; the same time bound holds for updates. The constant e is used here 
to minimize the space used: more generally, we can set w = ejb and d = logj | for any 
6 > 1 to get the same accuracy guarantee. Choosing b = e minimizes the space used, 
since this solves = 0, giving a cost of (2 -f f ) In | words. For implementations, 

it may be preferable to use other (integer) values of b for simpler computations or faster 
updates. 

The best known previous result using sketches was in [4] : there sketches were used to 
approximate point queries. Results were stated in terms of the frequencies of individual 
items. For arbitrary distributions, the space used is 0(^ log |), and the dependency on 
£ is ^ in every case considered. 

In the full version of this paper' we describe how all existing sketch constructions 
can be viewed as variations of a common construction. This emphasizes the importance 
of our attempt to find the simplest sketch construction which has the best guarantees 
and smallest constants. A similar result holds when entries of the implicit vector a may 
be negative, which is the general case. Details of this appear in the full version of this 
paper. 

* To appear in Journal of Algorithms 
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4.2 Inner Product Query 

Estimation Procedure. Set (a 0 b)j = cownta[j, k] * countb[j^ k]. Our estima- 

tion of Q(a, b) for non-negative vectors a and bis aQ b = minj(a 0 b)j. 

Theorem 2. aQb < aQ b and, with probability 1 — <5, aQ b < a0b-|-e||a||i||6||i. 

Proof. 

n 

{aQb)j ='^Qibi+ ^ Opbq 

i=l P¥^q,hj{p) = hj(q) 

Clearly, aQb < a Qbj fox non-negative vectors. By pairwise independence of h, 

E(a Q bj — a Q b) = ^ Pr[hj{p) = hj{q)]apbq < ^ ^ 

p^q p^q 

So, by the Markov inequality, Pr[a Q b — a Q b > e||a||i||b||i] < S, as required. I 

The space and time to produce the estimate is 0( ^ log y). Updates are performed in 
time 0(log y). 

Join size estimation is important in database query planners in order to determine 
the best order in which to evaluate queries. The join size of two database relations on a 
particular attribute is the number of items in the cartesian product of the two relations 
which agree the value of that attribute. We assume without loss of generality that attribute 
values in the relation are integers in the range 1 . . . n. We represent the relations being 
joined as vectors a and b so that the values Ui represents the number of tuples which have 
value i in the first relation, and bi similarly for the second relation. Then clearly aQb 
is the join size of the two relations. Using sketches allows estimates to be made in the 
presence of items being inserted to and deleted from relations. The following corollary 
follows from the above theorem. 

Corollary 1. The Join size of two relations on a particular attribute can be approximated 
up to e||a||i||b||i with probability 1 — 5,by keeping space 0(k log f). 

Previous results have used the “tug-of-war” sketches [1]. However, here some care is 
needed in the comparison of the two methods: the prior work gives guarantees in terms 
of the L 2 norm of the underlying vectors, with additive error ofe||a|| 2 ||b|| 2 ; here, the 
result is in terms of the Li norm. In some cases, the L 2 norm can he quadratically smaller 
than the Li norm. However, when the distribution of items is non-uniform, for example 
when certain items contribute a large amount to the join size, then the two norms are 
closer, and the guarantees of the CM sketch method is closer to the existing method. As 
before, the space cost of previous methods was so there is a significant space 

saving to be had with CM sketches. 

4.3 Range Query 

Estimation Procedure. We will adopt the use of dyadic ranges from [13]: a dyadic range 
is a range of the form [a;2*^ -f 1 . . . (x -f 1)2*^] for parameters x and y. Keep log 2 n CM 
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sketches, in order to answer range queries Q{l,r) approximately. Any range query can be 
reduced to at most 2 log 2 n dyadic range queries, which in turn can each be reduced to a 
single point query. Each point in the range [1 ... n] is a member of log 2 n dyadic ranges, 
one for each y in the range 0 . . . log 2 (n) — 1. A sketch is kept for each set of dyadic 
ranges of length 2^, and update each of these for every update that arrives. Then, given 
a range query Q(/, r), compute the at most 2 log 2 n dyadic ranges which canonically 
cover the range, and pose that many point queries to the sketches, returning the sum of 
the queries as the estimate. Let a[l,r] = *^he answer to the query Q{1, r) and 

let d[l,r] be the estimate using the procedure above. 

Theorem 3. a[l, r] < d[/, r] and with probability at least 1 — <5, 

d[l,r\ < a[l,r] + 2£logn||a||i. 

Proof. Applying the inequality of Theorem 1, then a\l,r] < d[l,r]. Consider each 
estimator used to form d[/, r]; the expectation of the additive error for any of these is 
2 log n 1 1 1 a 1 1 1 , by linearity of expectation of the errors of each point estimate. Applying 
the same Markov inequality argument as before, the probability that this additive error 
is more than 2elogn||a||i for any estimator is less than i; hence, for all of them the 
probability is at most 5. I 

The time to compute the estimate or to make an update is 0(log(n) log y). The space 
used is log f). 

The above theorem states the bound for the standard CM sketch size. The guarantee 
will be more useful when stated without terms of log n in the approximation bound. This 
can be changed by increasing the size of the sketch, which is equivalent to rescaling e. In 
particular, if we want to estimate a range sum correct up to e' | |a| 1 1 with probability 1 — 5 

then set £ = 2 iogn - space used is log |). An obvious improvement of 

this technique in practice is to keep exact counts for the first few levels of the hierarchy, 
where there are only a small number of dyadic ranges. This improves the space, time and 
accuracy of the algorithm in practice, although the asymptotic bounds are unaffected. 

The best previous bounds for this problem in the turnstile model are given 
in [13], where range queries are answered by keeping O(logn) sketches, each of size 
log(n) log to give approximations with additive error £||a||i with proba- 
bility 1 — 5'. Thus the space used there is log and the time for updates is 

linear in the space used. The CM sketch improves the space and time bounds; it improves 
the constant factors as well as the asymptotic behavior. The time to process an update is 
significantly improved, since only a few entries in the sketch are modified, rather than a 
linear number. 



5 Applications of Count-Min Sketches 



By using CM sketches, we show how to improve best known time and space bounds for 
the two problems from Section 2. 
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5.1 Quantiles in the Turnstile Model 

In [13] the authors showed that finding the approximate (/>-quan tiles of the data subject 
to insertions and deletions can be reduced to the problem of computing range sums. 
Put simply, the algorithm is to do binary searches for ranges 1 . . . r whose range sum 
a[l, r] is k(p\ |a| |i for 1 < fc < ^ — 1. The method of [13] uses Random Subset Sums to 
compute range sums. By replacing this structure with Count-Min sketches, the improved 
results follow immediately. By keeping log n sketches, one for each dyadic range and 
setting the accuracy parameter for each to be £:/ log n and the probability guarantee to 
S(f)/ log(n), the overall probability guarantee for all 1 /4> quantiles is achieved. 

Theorem 4. e-approximate (p-quantiles can be found with probability at least 1 — S by 
keeping a data structure with space 0(^ log^(n) log(^^^j^)). The time for each insert or 

delete operation is 0(log(n) log(^^^j^)), and the time to find each quantile on demand 
wO(log(n)log(^)). 

Choosing CM sketches over Random Subset Sums improves both the query time 
and the update time from O ( ^ log^ (n) log ) , by a factor of more than || log n . The 
space requirements are also improved by a factor of at least ^ . 

It is illustrative to contrast our bounds with those for the problem in the weaker 
Cash Register Model where items are only inserted (recall that in our stronger Turnstile 
model, items are deleted as well). The previously best known space bounds for finding 
approximate quantiles is 0(b(log^ ^ + log^ log j)) space for a randomized sampling 
and 0(i log(£||a||i)) space for a deterministic solution [14]. These bounds are not 
completely comparable, but our result is the first on the more powerful Turnstile model 
to be comparable to the Cash Register model bounds in the leading 1/e term. 



5.2 Heavy Hitters in the Turnstile Model 

We adopt the solution given in [5], which describes a divide and conquer procedure to 
find the heavy hitters. This keeps sketches for computing range sums: log n different 
sketches, one for each different dyadic range. When an update arrives, then 

each of these is updated as before. In order to find all the heavy hitters, a parallel 
binary search is performed, descending one level of the hierarchy at each step. Nodes 
in the hierarchy (corresponding to dyadic ranges) whose estimated weight exceeds the 
threshold of (0+e)| |a||i are split into two ranges, and investigated recursively. All single 
items found in this way whose approximated count exceeds the threshold are output. 

We instead must limit the number of items output whose true frequency is less than 
the fraction f. This is achieved by setting the probability of failure for each sketch to be 
2 This is because, at each level there are at most Xjf items with frequency more 
than (j). At most twice this number of queries are made at each level, for all of the log n 
levels. By scaling S like this and applying the union bound ensures that, over all the 
queries, the total probability that any one (or more) of them overestimated by more than 
a fraction e is bounded by 6, and so the probability that every query succeeds is 1 — 5. 
It follows that 
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Theorems. The algorithm uses space 0( j log(n) log j ), and time 

0(log(n) log ^ ^ I " ^ ) per update. Every item with frequency at least (</> + e)| |a||i is 
output, and with probability 1 — S no item whose frequency is less than |a||i is output. 

The previous best known bound appears in [5], where a non-adaptive group testing 
approach was described. Here, the space bounds agree asymptotically but have been 
improved in constant factors; a further improvement is in the nature of the guarantee: 
previous methods gave probabilistic guarantees about outputting the heavy hitters. Here, 
there is absolute certainty that this procedure will hnd and output every heavy hitter, 
because the CM sketches never underestimate counts, and strong guarantees are given 
that no non-heavy hitters will be output. This is often desirable. 

In some situations in practice, it is vital that updates are as fast as possible, and here 
update time can be played off against search time: ranges based on powers of two can 
be replaced with an arbitrary branching factor k, which reduces the number of levels to 
logfc n, at the expense of costlier queries and weaker guarantees on outputting non-heavy 
hitters. 



6 Conclusions 

We have introduced the Count-Min sketch, and shown how to estimate fundamental 
queries such as point, range or inner product queries as well as solve more sophisti- 
cated problems such as quantiles and heavy hitters. The space and/or time hounds of 
our solutions improve previously best known bounds for these problems. Typically the 
improvement is from factor to 1/e which is signihcant in real applications. Our 
CM sketch is quite simple, and is likely to hnd many applications, including in hardware 
solutions for these problems. 

We have recently applied these ideas to the problem of change detection on data 
streams [6], and we also believe that it can be applied to improve the time and space 
bounds for constructing approximate wavelet and histogram representations of data 
streams [11]. Also, the CM Sketch can also be naturally extended to solve problems 
on streams that describe multidimensional arrays rather than the unidimensional array 
problems we have discussed so far. 

Our CM sketch is not effective when one wants to compute the norms of data stream 
inputs. These have applications to computing correlations between data streams and 
tracking the number of distinct elements in streams, both of which are of great interest. 
It is an open problem to design extremely simple, practical sketches such as our CM 
Sketch for estimating such correlations and more complex data stream applications. 
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Abstract. We address the problem of searching for a two-dimensional 
pattern in a two-dimensional text (or image), such that the pattern can 
be fonnd even if it appears rotated and brighter or darker than its oc- 
currence. Furthermore, we consider approximate matching under several 
tolerance models. We obtain algorithms that are almost worst-case opti- 
mal. The complexities we obtain are very close to the best current results 
for the case where only rotations, but not lighting invariance, are sup- 
ported. These are the first results for this problem under a combinatorial 
approach. 

1 Introduction 

We consider the problem of finding the occurrences of a two-dimensional pattern 
of size m X m cells in a two-dimensional text of size n x n cells, when all pos- 
sible rotations of the pattern are allowed and also pattern and text may have 
differences in brightness. This stands for rotation and lighting invariant template 
matching. Text and pattern are seen as images formed by cells, each of which 
has a gray level value, also called a color. 

Template matching has numerous important applications from science to 
multimedia, for example in image processing, content based information retrieval 
from image databases, geographic information systems, processing of aerial im- 
ages, to name a few. In all these cases, we want to find a small subimage (the 
pattern) inside a large image (the text) permitting rotations (a small degree or 
any). Furthermore, pattern and text may have been photographed under differ- 
ent lighting conditions, so one may be brighter than the other. 

The traditional approach to this problem [2] is to compute the cross correla- 
tion between each text location and each rotation of the pattern template. This 

* A part of the work was done while visiting University of Chile under a researcher 
exchange grant from University of Helsinki. 
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can be done reasonably efficiently using the Fast Fourier Transform (FFT), re- 
quiring time 0{Kn^ log n) where K is the number of rotations sampled. Typically 
K is 0(m) in the two-dimensional (2D) case, and O(m^) in the 3D case, which 
makes the FFT approach very slow in practice. In addition, lighting-invariant 
features may be defined in order to make the FFT insensitive to brightness. Also, 
in many applications, “close enough” matches of the pattern are also accepted. 
To this end, the user may specify, for example, a parameter k such that matches 
that have at most k differences with the pattern should be accepted, or a pa- 
rameter 6 such that gray levels differing by less than 6 are considered equal. The 
definition of the matching conditions is called the “matching model” . 

Rotation invariant template matching was first considered from a combinato- 
rial point of view in [8,9]. Since then, several fast filters have been developed for 
diverse matching models [10,7,6]. These represent large performance improve- 
ments over the FFT-based approach. The worst-case complexity of the problem 
was also studied [1,7]. However, lighting invariance has not been considered in 
this scenario. 

On the other hand, transposition invariant string matching was considered 
in music retrieval [3,11]. The aim is to search for (one-dimensional) patterns in 
texts such that the pattern may match the text after all its characters (notes) 
are shifted by some value. The reason is that such an occurrence will sound 
like the pattern to a human, albeit in a different scale. In this context, efficient 
algorithms for several approximate matching functions were developed in [12]. 

We note that transposition invariance becomes lighting invariance when we 
replace musical notes by gray levels of cells in an image. Hence, the aim of 
this paper is to enrich the existing algorithms for rotation invariant template 
matching [7] with the techniques developed for transposition invariance [12] so 
as to obtain rotation and lighting invariant template matching. It turns out that 
lighting invariance can be added at very little extra cost. The key technique ex- 
ploited is incremental distance computation; we show that several transposition 
invariant distances can be computed incrementally taking the computation done 
with the previous rotation into account in the next rotation angle. 

Let us now determine which are the reasonable matching models. In [7], 
some of the models considered were useful only for binary images, a case where 
obviously we are not interested in this paper. We will address models that make 
sense for gray level images. We define three transposition-invariant distances: 

, which counts how many pattern and text cells differ by more than 6; ^madj 
which is the maximum color difference between pattern and text cells when up to 
K outliers are permitted; and dsAjj, which is the sum of absolute color differences 
between pattern and text cells permitting up to k outliers. Table 1 shows our 
complexities to compute these distances for every possible rotation of a pattern 
centered at a fixed text position. Variable cr is the number of different gray levels 
(assume cr = oo if the alphabet is not a finite discrete range). A lower bound to 
this problem is O(m^), achieved in [7] without lighting invariance. 

We also define two search problems, consisting in finding all the transposition- 
invariant rotated occurrences of P in T such that: (1) there are at most k cells 
of P differing by more than 6 from their text cell ((5-matching); or (2) the sum 
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Table 1. Worst-case complexities to compute the different distances defined. 



Distance 


Complexity 


it,K 

''mad 

,t,K 

“sad 


min(log m, a -\- {S -\- l))nP 
(min(fv, (j) -I- logmin(m, o-))m® 
(min(fv, (j) -I- logmin(m, o-))m® 



of absolute difference between cells in P and T, except for k outliers, does not 
exceed 7 ( 7 -matching). Note that i5-matching can be solved by examining every 
text cell and reporting it if d\f{P,T) < k, or if T) < <5 around the 

text cell. Hence any 0{f{m)) algorithm for computing or T) yields 

an 0{f{m)'n?) algorithm for d-matching. Similarly, 7 -matching can be reduced 
to checking whether around each cell. Without transposition 

invariance all searching worst cases are [7]. 

We remark that we have developed algorithms that work on arbitrary al- 
phabets, but we have also taken advantage of the case where the alphabet is a 
discrete range of integer values. 

A full version of this paper [5] considers also {6, 7 )-matching and optimal 
average case search complexities. 

2 Definitions 

Let T = T[l..n, l..n] and P = l..m] be arrays of unit squares, called 

cells, in the (x, y)-plane. Each cell has a value in an alphabet called S, sometimes 
called its gray level or its color. A particular case of interest is that of S being 
a finite integer range of size a. The corners of the cell for T[i,j] are (i — l,j — 
1), {i,j — 1), {i — l,j) and (i,j). The center of the cell for T[i,j] is (z — j — |). 
The array of cells for pattern P is defined similarly. The center of the whole 
pattern P is the center of the cell in the middle of P. Precisely, assuming for 
simplicity that m is odd, the center of P is the center of cell 

Assume now that P has been moved on top of T using a rigid motion (trans- 
lation and rotation), such that the center of P coincides exactly with the center 
of some cell of T {center -to- center assumption). The location of P with respect 
to T can be uniquely given as {{i,j),9) where (z, j) is the cell of T that matches 
the center of P, and 6 is the angle between the x-axis of T and the x-axis of P. 
The (approximate) occurrence between T and P at some location is defined by 
comparing the values of the cells of T and P that overlap. We will use the centers 
of the cells of T for selecting the comparison points. That is, for the pattern at 
location {{i,j),9), we look which cells of the pattern cover the centers of the 
cells of the text, and compare the corresponding values of those cells (Fig. 1). 

More precisely, assume that P is at location {{i,j),9). For each cell T[r,s] 
of T whose center belongs to the area covered by P, let P[r' , s'] be the cell of 
P such that the center of T[r,s] belongs to the area covered by P[r',s']. Then 
M{T[r, s]) = P[r', s'], that is, our algorithms compare the cell T[r, s] of T against 
the cell M{T[r, s]) of P. 
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Fig. 1. Each text cell is matched against the pattern cell that covers the center of the 
text cell. 



Hence the matching function M is a function from the cells of T to the 
cells of P. Now consider what happens to M when angle 9 grows continu- 
ously, starting from 6 = 0. Function M changes only at the values of 6 such 
that some cell center of T hits some cell boundary of P. It was shown in 
[8] that this happens O(m^) times, when P rotates full 27t radians. This re- 
sult was shown to be also a lower bound in [1]. Hence there are 0(m^) rele- 
vant orientations of P to be checked. The set of angles for 0 < 9 < 7r/2is 



A = {/?, 7t/2 — P \ = arcsin 



h+i 






— arcsin 






z = 1,2,..., [m/2\\j = 



0, 1, . . . , [to/ 2J ; /i = 0, 1, . . . , Y\/P + J^j}- By symmetry, the set of possible an- 
gles 0 , 0 < 0 < 27t, is .4 = H U A + ttI2 U A + tt U A + 37t/2. 

Furthermore, pattern P matches at location {(i,j),9) with lighting invariance 
if there is some integer transposition t such that T[r, s]-|-t = P[r' , s'] for all [r', s'] 
in the area of P. 

Once the position and rotation {(i,j),9) of P in T define the matching func- 
tion, we can compute different kinds of distances between the pattern and the 
text. Lighting-invariance versions of the distances choose the transposition min- 
imizing the basic distance. Interesting distances for gray level images follow. 



Hamming Distance (H): The number of times T[r, s] yf P[r',s'] occurs, 
over all the cells of P, that is, dn{i,j,9,t) = T’Ks] + t 

P[r',s'] then 1 else 0], and d\^{i,j,9) = mmtdn{i,j,9,t). This can be 
extended to distance df^ and its transposition-invariant version d^^ , where 
colors must differ by more than S in order to be considered different, that is, 
T[r, s]+t ^ [P[r', s'] - 5, P[r' , s'] -|- (5]. 

Maximum Absolute Differences (MAD): The maximum value of ]T[r, s] — 
P[r',s']l over all the cells of P, that is, dMAD(bj)^,i) = max^'^s' jP]?’, s] -I- 
t — P[r',s']j, and d^j^j^{i,j,9) = mint duABih j,9,t)- This can be extended 

to distance d^AD transposition-invariant version dMAD> 

to K pattern cells are freed from matching the text. Then the problem is to 
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compute the MAD distance with the best choice of k outliers that are not 
included in the maximum. 

Sum of Absolute Differences (SAD): The sum of the |T[r, s] — P[r', s']| val- 
ues over all the cells of P, that is, dsAD(*, J, O, t) = J2r' s' s]+t—P[r', s']|, 
and dsAD(bi>^) = min* dsAD(L j) 6*, t)- Similarly, this distance can be ex- 
tended to d§AD it® transposition-invariant version where up to k, 

pattern cells can be removed from the summation. 



3 Efficient Algorithms 

In [1] it was shown that for the problem of the two dimensional pattern matching 
allowing rotations the worst case lower bound is We have shown in 

[7] a simple way to achieve this lower bound for any of the distances under 
consideration (without lighting invariance). The idea is that we will check each 
possible text center, one by one. So we have to pay 0{m^) per text center to 
achieve the desired complexity. What we do is to compute the distance we want 
for each possible rotation, by reusing most of the work done for the previous 
rotation. Once the distances are computed, it is easy to report the triples {i,j, 6) 
where these values are smaller than the given thresholds (<5 and/or 7). Only 
distances dn (with d = 0) and dsAD (with k = 0) were considered. 

Assume that, when computing the set of angles A = {(3i, (32, ■ ■ ■), we also 
sort the angles so that Pi < Pi+i, and associate with each angle Pi the set Ci 
containing the corresponding cell centers that must hit a cell boundary at Pi. 
This is done in a precomputation step that depends only on m, not on P or T. 
Hence we can evaluate the distance functions (such as dsAo) incrementally for 
successive rotations of P. That is, assume that the distance has been evaluated 
for Pi, then to evaluate it for rotation /3j+i it suffices to re-evaluate the cells 
restricted to the set Ci. This is repeated for each P G A. Therefore, the total 
time for evaluating the distance for P centered at some position in T, for all 
possible angles, is 0(X)i This is 0{mP) because each fixed cell center of T, 
covered by P, can belong to some Ci at most 0{m) times. To see this, note that 
when P is rotated the whole angle 2tt, any cell of P traverses 0{m) cells of T. 

If we want to add lighting invariance to the above scheme, a naive approach 
is to run the algorithm for every possible transposition, for a total cost of 
0{rPrrPa). In case of a general alphabet there are 0{rrP) relevant transpo- 
sitions at each rotation (that is, each pattern cell can be made to match its 
corresponding text cell). Hence the cost raises to 0{nPwP). 

In order to do better, we must be able to compute the optimal transposition 
for the initial angle and then maintaining it when some characters of the text 
change (because the pattern has been aligned over a different text cell). If we 
take /(to) time to do this, then our lighting invariant algorithm becomes worst- 
case time 0{rPrnP f{m)). In the following we show how can we achieve this for 
each of the distances under consideration. 
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3.1 Distance 

As proved in [12], the optimal transposition for Hamming distance is obtained 
as follows. Each cell P[r',s'], aligned to T[r,s], votes for a range of transposi- 
tions [P[r', s'] — T[r, s] — 6 , P[r', s'] — T[r, s] -I- 5], for which it would match. If a 
transposition receives v votes, then its Hamming distance is — v. Hence, the 
transposition that receives most votes is the one yielding distance Let us 
now separate the cases of integer and general alphabets. 

Integer alphabet. The original algorithm [12] obtains 0(a + |P|) time on integer 
alphabet, by bucket-sorting the range extremes and then traversing them linearly 
so as to find the most voted transposition (a counter is incremented when a range 
starts and decremented when it finishes) . 

In our case, we have to pay 0{a + m^) in order to find the optimal transpo- 
sition for the first rotation angle. The problem is how to recompute the optimal 
transposition once some text cell T[r, s] changes its value (due to a small change 
in rotation angle). The net effect is that the range of transpositions given by the 
old cell value loses a vote and a new range gains a vote. 

We use the fact that the alphabet is an integer range, so there are 0{<j) 
possible transpositions. Each transposition can be classified according to the 
number of votes it has. There are m? + 1 lists Lj, 0 < i < m^, containing the 
transpositions that currently have i votes. Hence, when a range of transpositions 
loses/gains one vote, the 25 + \ transpositions are moved to the lower/upper 
list. An array pointing to the list node where each transposition appears is 
necessary to efficiently find each of those 26+1 transpositions. We need to keep 
control of which is the highest-numbered non-empty list, which is easily done 
in constant time per operation because transpositions move only from one list 
to the next /previous. Initially we pay 0(cr -I- m?) to initialize all the lists and 
put all the transpositions in list Lq, then 0{{5 + l)m^) to process the votes of 
all the cells, and then 0{5 + 1) to process each cell that changes. Overall, when 
we consider all the 0{m^) cell changes, the scheme is 0{a -I- (<5 -I- l)m^). This is 
our complexity to compute distance between a pattern and a text center, 
considering all possible rotations and transpositions. 

General alphabet. Let us resort to a more general problem of dynamic range 
voting: In the static case we have a multiset S = {[^, r]} of one-dimensional 
closed ranges, and we are interested in obtaining a point p that is included in 
most ranges, that is maxvote(S') = maxj,|{[£, r] & S \ £ < p < r}[. In the 
dynamic case a new range is added to or an old one is deleted from S, and we 
must be able to return maxvote(S') after each update. 

Notice that our original problem of computing d^ from one rotation angle 
to another is a special case of dynamic range voting; multiset S is {[P[r',s'] — 
T[r,s]— <5,P[r',s'] — r[r,s]-|-i5] | M{T[r,s]) = P[r', s']} in one rotation angle, and 
in the next one some T[r, s] changes its value. That is, the old range is deleted 
and the new one is inserted, after which maxvote(S') is requested to compute 
the distance d^ = — maxvote(S') in the new angle. 
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We show that dynamic range voting can be supported in 0(log|5'|) time, 
which immediately gives an O(m^logm) time algorithm for computing be- 
tween a pattern and a text center, considering all rotations and transpositions. 

First, notice that the point that gives maxvote(5') can always be chosen 
among the endpoints of ranges in S. We store each endpoint e in a balanced 
binary search tree with key e. Let us denote the leaf whose key is e simply by 
(leaf) e. With each endpoint e we associate a value vote(e) (stored in leaf e) that 
gives the number |{[i', r] | ^ < e < r, [f, r] G S'}], where the set is considered as 
a multiset (same ranges can have multiple occurrences). In each internal node 
V, value maxvote(u) gives the maximum of the vote(e) values of the leaves e in 
its subtree. After all the endpoints e are added and the values vote(e) in the 
leaves and values maxvote(u) in the internal nodes are computed, the static case 
is solved by taking the value maxvote(root) = maxvote(5') in the root node of 
the tree. 

A straightforward way of generalizing the above approach to the dynamic 
case would be to recompute all values vote(e) that are affected by the inser- 
tion/deletion of a range. This would, however, take 0(|S'|) time in the worst 
case. To get a faster algorithm, we only store the changes of the votes in the 
roots of certain subtrees so that vote(e) for any leaf e can be computed by 
summing up the changes from the root to the leaf e. 

For now on, we refer to vote(e) and maxvote(t!) as virtual values, and replace 
them with counters diff(u) and values maxdiff(w). Counters diff(r!) are defined 
implicitly so that for all leaves of the tree it holds 

vote(e) = ^ diff(v), (1) 

vepath(root,e) 



where path(root, e) is the set of nodes in the path from the root to a leaf e 
(including the leaf). We note that there are several possible ways to choose 
diff(w) values so that they satisfy the definition. Values maxdiff(f) are defined 
recursively as 

max(maxdiff(w.?e/t) -I- diff(w./e/t), maxdiff(f .rz^/it) -|- diS{v. right)), (2) 

where v.left and v.right are the left and right child of v, respectively. In partic- 
ular, maxdiff(e) = 0 for any leaf node e. One easily notices that 

maxvote(w) = maxdiff(r!) -I- diff(ti'), 

v' ^pdi,th.{root,v) 



which also gives as a special case Equation (1) once we notice that maxvote(e) = 
vote(e) for each leaf node e. 

Our goal is to maintain diff() and maxdiff() values correctly during insertions 
and deletions. We have three different cases to consider: (i) How to compute the 
value diff(e) for a new endpoint of a range, (ii) how to update the values of 
diff() and maxdiff() when a range is inserted/deleted, and (iii) how to update 
the values during rotations to rebalance the tree. 
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Case (i) is handled by storing in each leaf an additional counter end(e). It 
gives the number of ranges whose rightmost endpoint is e. Assume that this 
value is computed for all existing leaves. When we insert a new endpoint e, we 
either find a leaf labeled e or otherwise there is a leaf e' after which e is inserted. 
In the first case vote(e) remains the same and in the latter case vote(e) = 
vote(e') — end(e'), because e is included in the same ranges as e' except those 
that end at e'. Notice also that vote(e) = 0 in the degenerate case when e is 
the leftmost leaf. The +1 vote induced by the new range whose endpoint e is, 
will be handled in case (ii). To make vote(e) = X)«'gpath(root e) diff(^^O) 
diff(e) so that vote(e) = diff(e) + X)t,'gpath(rooi v) diff(r’O) where v is the parent 
of e. Once the maxdiff() values are updated in the path from e to the root, we 
can conclude that all the necessary updates are done in 0(log |S'|) time. 

Let us then consider case (ii). Recall the one-dimensional range search on a 
balanced binary search tree (see e.g. [4], Section 5.1). We use the fact that one 
can find in 0(log [S'!) time the minimal set of nodes, say F, such that the range 
[£,r] of S is partitioned by F; the subtrees starting at nodes of F contain all 
the points in r] 0 S' and only them. It follows that when inserting (deleting) a 
range [i,r], we can set difT(w) = difT(w) -I- 1 (diff(?;) = diff(w) — 1) at each v G F. 
This is because all the values vote(e) in these subtrees change by ±1 (including 
leaves £ and r). Note that some diff(v) values may go below zero, but this does 
not affect correctness. To keep also the maxdiff() values correctly updated, it is 
enough to recompute the values in the nodes in the paths from each v G F to the 
root using Equation (2); other values are not affected by the insertion/deletion 
of the range [£, r]. The overall number of nodes that need updating is 0(log IF]). 

Finally, let us consider case (iii). Counters diff(w) are affected by tree rota- 
tions, but in case a tree rotation involving e.g. subtrees v.left, v.right.left and 
v.right.right takes place, values diff(v) and d\ti{v .right) can be “pushed” down 
to the roots of the affected subtrees, and hence they become zero. Then the tree 
rotation can be carried out, also maintaining subtree maxima easily. 

Hence, each insertion/deletion takes 0(log [S'!) time, and maxvote(S') = 
maxdiff(root) -I- diff(root) is readily available in the root node. 

3.2 Distance d^AD 

Let us start with k = 0. As proved in [12], the optimal transposition for dis- 
tance d^AD obtained as follows. Each cell P[r' , s'], aligned to T[r, s], votes for 
transposition P[r',s'] — T[r, s]. Then, the optimal transposition is the average 
between the minimum and maximum vote, and d^AD distance is the difference 
of maximum minus minimum, divided by two. An O(jPj) algorithm follwed. 

We need 0{m?) to obtain the optimal transposition for the first angle, zero. 
Then, in order to address changes of text characters (because, due to angle 
changes, the pattern cell was aligned to a different text cell), we must be able to 
maintain minimum and maximum votes. Every time a text character changes, 
a vote disappears and a new vote appears. We can simply maintain balanced 
search trees with all the current votes so as to handle any insertion/deletion 
of votes in 0(log(m^)) = O(logm) time, knowing the minimum and maximum 
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at any time. If we have an integer alphabet of size a, there are only 2cr + 1 
possible votes, so it is not hard to obtain 0{loga) complexity. Hence ^mad 
distance between a pattern and a text center can be computed in 0{m^logm) 
or 0 (to^ logmin(m, ct)) time, for all possible rotations and transpositions. 

In order to account for up to k outliers, it was already shown in [12] that 
it is optimal to choose them from the pairs that vote for maximum or mini- 
mum transpositions. That is, if all the votes are sorted into a list vi...v^ 2 , 
then distance d^AD minimum among distances ^mad computed in sets 

vi . . . V 2 ■ ■ ■ Vm^-K+i , and so on until v^+i ■ ■ ■ • Moreover, the optimum 

transposition of the t-th value of this list is simply the average of maximum and 
minimum, that is, (vm^-K-i+i + Vi)l2. 

So our algorithm for d^AD i® ^® follows. We make our tree threaded (each 
node points to its predecessor and successor in the tree), so we can easily access 
the K + 1 smallest and largest votes. After each change in the tree, we retra- 
verse these K+1 pairs and recompute the minimum among the — Vi 

differences. This takes 0{m^{n + logm)) time. In case of an integer alphabet, 
since there cannot be more than 0{a) different votes, this can be done in time 
0(?Tt^(min(K, a) + logmin(m, cr))). 

3.3 Distance dgAD 

Let us first consider case k = 0. This corresponds to the SAD model of [12], 
where it was shown that, if we collect votes P[r' , s'] — T[r, s], then the median 
vote (either one if [P] is even) is the transposition that yields distance dg^^p. 
Then the actual distance can be obtained by using the formula for dsAD, and an 
O(jPj) time algorithm was immediate. 

In this case we have to pay 0{m?) to compute the distance for the first 
rotation, and then have to manage to maintain the median transposition and 
current distance when some text cells change their value due to small rotations. 

We maintain a balanced and threaded binary search tree for the votes, plus 
a pointer to the median vote. Each time a vote changes because a pattern cell 
aligns to a new text cell, we must remove the old vote and insert the new one. 
When insertion and deletion occur at different halves of the sorted list of votes 
(that is, one is larger and the other smaller than the median), the median may 
move by one position. This is done in constant time since the tree is threaded. 

The distance value itself can change. One change is due to the fact that one 
of the votes changed its value. Given a fixed transposition, it is trivial to remove 
the appropriate summand and introduce a new one in the formula for dsAD- 
Another change is due to the fact that the median position can change from 
a value in the sorted list to the next or previous. It was shown in [12] how to 
modify in constant time distance dg^D fo ^^lis case: If we move from transposition 
Vj to Wj+i, then all the j smallest votes increase their value by Wj+i — Vj, and 
the m — j largest votes decrease by vj+i — Vj. Hence distance dsAD at the new 
transposition is the value at the old transposition plus (2j — m){vj+i — Vj). 

Hence, we can traverse all the rotations in time O(m^logm). This can be 
reduced to 0(m^ log min(m, cr)) on finite integer alphabet, by noting that there 
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cannot be more than 0{a) different votes, and taking some care in handling 
repeated values inside single tree nodes. 

To compute distance dsAD> have again that the optimal values to free 
from matching are those voting for minimum or maximum transpositions. If we 
remove those values, then the median lies at positions m — [k/2] . . .to + \n/2\ 
in the list of sorted votes, where to is the median position for the whole list. 

Hence, instead of maintaining a pointer to the median, we maintain two point- 
ers to the range of K .+ 1 medians that could be relevant. It is not hard to maintain 
left and right pointers when votes are inserted and deleted in the set. All the 
median values can be changed one by one, and we can choose the minimum dis- 
tance among the k+1 options. This gives us an O^m^^K+logm)) time algorithm 
to compute dgAD- integer alphabet, this is 0 (to^(k -I- logmin(TO, ct))), which 
can be turned into 0{m^{min{K,a) + logmin(TO, ct))) by standard tricks using 
the fact that there are 0{a) possible median votes that have different values. 
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Abstract. In this paper we provide an explicit way to compute asymp- 
totically almost sure upper bounds on the bisection width of random d- 
regular graphs, for any value of d. We provide the bounds for 5 < d < 12. 
The upper bounds are obtained from the analysis of the performance of 
a randomized greedy algorithm to find bisections of d-regular graphs. We 
also give empirical values of the size of bisection found by the algorithm 
for some small values of d and compare it with numerical approxima- 
tions of our theoretical bounds. Our analysis also gives asymptotic lower 
bounds for the size of the maximum bisection. 



1 Introduction 

Given a graph G = {V,E) with \V\ = n and n even, a bisection of U is a 
partition of V into two parts each of cardinality n/2, and its size is the number 
of edges crossing between the parts. A minimum bisection is a bisection of V 
with minimal size. The decision problem related to finding a minimum bisection 
is known to be NP-complete [10], even for 3-regular graphs [5]. (See for example 
[7] for further results an applications on graph bisection). The size of a minimum 
bisection is called the bisection width and the min bisection problem consists of 
finding a minimum bisection in a given G. In the present paper, we give a family 
of randomized algorithms which give asymptotic upper bounds as n — >■ oo on 
the bisection width of almost all d-regular graphs, where d is fixed. 

Plenty of results are known on bisection width. With respect to lower bounds, 
in 1975 Fiedler gave a spectral lower bound of A2n/4 applicable for any graph, 
where A 2 is the second eigenvalue of the Laplacian of the graph [9]. In 1984, 
Bollobas provided a lower bound of (| — for almost all d-regular graphs 
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[3]. Later Kostochka and Melnikov proved that almost all cubic graphs have 
bisection width greater than O.lOln [12]. Using spectral techniques, Bezrukov 
et al. gave lower bounds of 0.082n for the bisection width of cubic Ramanujan 
graphs, and of 0.176n for the case of 4-regular Ramanujan graphs [2]. 

Regarding upper bounds, Kostochka and Melnikov proved that asymp- 
totically as n — >■ oo, all d regular graphs have bisection width of at most 
0{d^Jn\ogn) [12]. Later, Alon proved that for n > 40d®, all d-regular 
graphs have bisection width at most (| — [1]. More recently, Monien and 

Preis [14] gave upper bounds on the bisection width of (| -I- e)n for 3-regular 
graphs and of (0.4 -|- e)n for 4-regular graphs, for any e, when n is larger than 
some function of the chosen e. To the best of our knowledge, the most recent 
result on bisection width was given in [ 6 ], where it was proved that the bisec- 
tion width of a random 4-regular graph on n vertices is asymptotically smaller 
than (5 + e)n, with probability tending to 1 (a.a.s.). This result was proved by 
analysing a simple greedy algorithm, a variant of which only yielded bisections 
of width 0.174 for a random cubic graph on n vertices. 

The problem of finding the maximum bisection size has also received consid- 
erable attention. This problem is again NP-hard even for planar graphs [11]. It 
is known to be solvable in polynomial time for graphs of bounded treewidth [ 11 ]. 

The maximum bisection size is a lower bound on the maximum size (number 
of edges) of a bipartite subgraph. Locke [13] showed that a d-regular graph which 
is not complete or a cycle has a bipartite subgraph with at least {nd/4)d/{d— 1 ) 
edges if d is even, and at least (nd/4)((d -|- l)/d + 2/d^) edges if d is odd. 
Shearer [15] improved this result to (nd/4) + n^fd/%\/2 for triangle-free graphs, 
a property which a positive fraction of random regular graphs have. Our lower 
bounds for maximum bisection in random d-regular graphs easily exceed these 
bounds. 

In Section 2 we present a basic randomized algorithm to find a (small) bisec- 
tion of a graph by 2-colouring its vertices in a greedy way. The next vertex to 
be coloured is chosen according to a prioritisation scheme. The priority depends 
on to the status of a vertex with respect to the number of neighbours it has of 
either colour. Many different priority schemes were considered, each specified by 
a list of the types of vertices (i.e. their possible status with respect to colours of 
neighbours) . 

This prioritisation scheme is significant both as a simplification and as a 
generalisation of the method in [ 6 ], where only 3-regular and 4-regular graphs 
were considered. It is a generalisation because, to each of the algorithms given 
there, there are corresponding algorithms of the general type considered in this 
paper which have equivalent asymptotic performance (although the algorithms 
do not give identical results). It is a simplification, because the method in [ 6 ] 
was to specify one or two main phases of the algorithm. In each main phase, two 
types of vertex were coloured, with one of the specified types having priority. 
Such detailed control of the algorithm is difficult to generalise to higher d because 
of the difficulty of knowing which types of vertices might be available when 
one phase ends and a new phase starts. The key idea in the present paper 
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is to specify which types of vertices should have priority over which others, 
throughout the whole algorithm. The transitions between phases then become 
automatic. It is hard to substantiate this claim without looking at the algorithm 
in more detail, since even the definition of a phase becomes more delicate with 
this approach. A similar effect occurred in the analysis of greedy algorithms for 
finding independent sets in random regular graphs [17], but the situation there 
was considerably simpler. In that case, the prioritisation was merely according 
to the degree of a vertex during a deletion algorithm, whilst in the present case, 
the best prioritisation list is much harder to determine. Moreover, in the present 
case, the algorithm in some sense returns to phase which it visited earlier, and 
this did not happen in [17]. 

In Section 3, we sketch the analysis of the performance using the differential 
equation method. For any given d, we choose the appropriated priority list, 
set the equations and solve them numerically to find the asymptotic bisection 
width for the random d-regular graphs under consideration. In the same section, 
we produce empirical evidence indicating that there are two types of optimal 
priority lists of the vertices: one for even values of d and the other for odd 
values of d. In Section 4, we give empirical results comparing the values obtained 
numerically from the differential equations with the bisection width obtained 
by the randomized greedy algorithm. In Section 5, we discuss the maximum 
bisection results. It should be emphasised that the main contribution of this 
paper is to give better asymptotic bounds for the bisection width of d-regular 
graphs (d > 4), and the algorithm produced in Section 2 is only of methodological 
value. 



2 The Priority-Greedy Algorithm 

In this section, we describe a family of randomized greedy procedures to find 
a bisection for d-regular graphs. We also introduce some generic notation to be 
used later. 

Given a graph, and given a partial assignment of colours red (R) and blue 
(B) to its vertices, we classify the uncoloured vertices according to the colours 
of their neighbours: An uncoloured vertex is of Type (r, 6), if it has r neighbours 
coloured R and b neighbours coloured B. 

For r < b, we say that a pair of uncoloured vertices is a symmetric pair if their 
types are (r, b) and (6, r) for some r and b. We then call this an (r, b)-symmetric 
pair, or a symmetric pair of type (r, b). 

The greedy procedure works by colouring vertices chosen randomly in sym- 
metric pairs, to maintain balancedness, and repeatedly uses the majority opera- 
tion (Maj), that colours each vertex of an (r, &)-symmetric pair, r < b, with the 
majority colour among its neighbours, and, given an (r, &)-symmetric pair with 
r = b, randomly colours one vertex of the pair R and the other B . 

We assume that the symmetric pair types have the priorities 0,1,2,... asso- 
ciated with them (a larger number denotes higher priority). The priority-greedy 
algorithm for random d-regular graphs is given in Figure 1. 
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Initial step: 



Main iteration: 



Clean up: 



input prio(r, b) for all r < 6 < d; 

select two non-adjacent vertices u.a.r., colour one with R and 
the other with B. 

while there is at least one uncoloured symmetric pair do 
let (r, 6) denote the highest priority type of 
an uncoloured symmetric pair; 
select u.a.r. an (r, 6)-symmetric pair 
and perform Maj; 

colour any remaining uncoloured vertices, half of them R 
and half B, in any manner, and output the bisection R, B. 



Fig. 1. Algorithm priority-greedy for obtaining a bisection of a d-regular graph 



This algorithm takes as input a predetermined priority list assigning a dis- 
tinct priority, prio(r, b), to each symmetric pair type (r, b). We impose the condi- 
tions on all priority lists that prio(0, 0) = 0 < prio(r, b) whenever (r, b) ^ (0, 0). 
Note that the priority of pairs (r, b) with r + b = d is immaterial since colouring 
vertices in such pairs cannot affect the remainder of the algorithm. So for sim- 
plicity we assume that all such vertices have negative priority, and only those 
with r + b < d need to be specified. 



3 Analysis: The Differential Eqnation System 

We follow the description in [6], extending it to the d-regular setting for arbitrary 
d. The algorithms considered there give equivalent results in special cases of 
the priority-greedy algorithm, for particular priority lists and for d = 3 and 4. 
(Notice, that the algorithms described in [6] for d = 4 and d = 4 are different 
than the general algorithm presented in this paper) 

One method of analysing the performance of a randomized algorithm is to use 
a system of differential equations to express the expected changes in the variables 
describing the state of the algorithm during its execution. An exposition of this 
method can be found in [18], which includes various examples of graph-theoretic 
optimisation problems. 

We use the pairing model to generate n- vertex d-regular graphs u.a.r. Briefly, 
to generate such a random graph, it is enough to begin with dn points in n cells, 
and choose a random perfect matching of the points, which we call a pairing. 
The corresponding pseudograph (possibly with loops or multiple edges) has the 
cells as vertices and the pairs as edges. Any property a.a.s. true of the random 
pseudograph is also a.a.s. true of the restriction to random graphs, with no 
loops or multiple edges, and this restricted probability space is uniform (see for 
example [4,19] for a full description). 

We consider the priority-greedy algorithm applied directly to the random 
pairing. As discussed in [18], the random pairing can be generated pair by pair, 
and at each step a point p can be chosen by any rule whatsoever, as long as the 
other point in the pair is chosen u.a.r. from the remaining unused points. We 
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call this step exposing the pair containing p. At each step of the priority-greedy 
algorithm in which a vertex is coloured, we expose all pairs containing points in 
that vertex. 

We now give an analysis of the algorithm which is not rigorous but will 
presumably yield the same bounds rigorously by introducing technical arguments 
as in [6]. Informally speaking, in a typical part of the algorithm, there will be 
symmetric pairs of one particular type, (rp, &o)> which are plentiful in the graph 
but are quite regularly chosen in the main iteration of the algorithm. Symmetric 
pairs of types with higher priority may also be regularly chosen, but will be 
rare and regularly be used up entirely (at which point another pair of type 
(rg, 6q) will be used). In this situation, we say that (rg, 6g) is the basic type. The 
algorithm will typically pass through phases, determined by points at which, 
roughly speaking, the basic type changes. A phase finishes when either symmetric 
pairs with higher priorities than the current basic type become plentiful, or those 
with the current basic type become very scarce. The boundaries of the phases 
are best defined precisely in terms of the solution of a set of differential equations 
which we now proceed to derive. 

At each point in the algorithm, let represent the number of uncoloured 
vertices of type (r,b), and let W denote the number of points not yet involved 
in exposed pairs. Then W = ^r+b<d(^ — r — b)Zrt- 

From this point onwards, we assume the reader is thoroughly familiar with the 
argument in [6], and omit any justifications are identical to those appearing there. 
Let dr^b denote the expected contribution to AZrb, the increment of Zrb, due to 
exposing the pair containing a point in a vertex u which has just been coloured 
red by the priority-greedy algorithm. Then the probability that the other point 
chosen in the pair is in a vertex v of type (i,j) is {d — i — j)Zi^jj {W — 1) (except 
for a correction of size 0{1/W) due to the change in status of u). Hence, ignoring 
terms of size 0{1/W), dr^ = {ad+i,r+bZr-i,b — ocd,r+bZr,b) /W for r + b< d, 
where a^,y is x — y when x > y, 0 otherwise. 

In the following we continue to ignore terms of size 0(1 /IT). The equations 
due to the case that u is coloured blue are dr^ = {ad+i,r+bZr,b-i—Oid,r+bZr,b)/W 
for r+b < d. Let dr^b the expected increment due to the colouring of a symmetric 
pair. Making the assumption of having rb-symmetry (for all i and j, Z^ = Zj^, 
and adding the effects from a point in the red vertex and a point in the blue, 
gives dr,h = (o:d+l,r+b{Zr,b-l + ^r-l.b) ~ ‘2oid,r+bZr,b) /W , ioT r + b < d. 

Let 4>r^b denote the probability of processing an (r, ^(-symmetric pair at some 
step in a given phase. This will be examined non-rigorously, since we to some 
extent ignore the history and current state of the process. (The equations which 
this heuristic argument culminates in can be used to define another process which 
can be analysed rigorously; see [6] for details.) Assume that at the beginning of 
a new phase, (rg,6g) is the basic type. Then these vertices are plentiful, and 
thus for a considerable part of the algorithm, no vertices of lower priority will be 
chosen for Maj . We calculate the </>’s for that symmetric pair type and all others 
with higher priority. 

Let B' denote the set of types of symmetric pairs with higher priority than 
the basic, (rg, 6g), and let B' = Bid {(rg, &g)}. 
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Given the assumption about the 4>’s, the expected number of points in a blue 
(or red) vertex when Maj is performed is c = X)(r' — r' — b')(f>r',b'- This 

leads to 

4>r,b = cdr,b, (r,b) G B', and ^ (j)r,b = 1- (1) 

{r,b)eB 

These three equations are easy to solve, and we find c = (1 — 4>ro,bo)l S, with S = 
J2(r' ,b')(^B' dr',b', which yields c = {d-ro~bo)/T, with T = ^ + j2(r' ,b')(^B'i'^' + 
b' — tq — bo)dr\b' ■ Now (f>r,b is determined for (r, b) G B', and a little computation 
produces 4>ro,bo = 1 + X)(r' b')(^B'W ~ d)dr' ,b' ■ Assuming r6-symmetry, and 

assuming validity of the equations for the </>’s, the expected increments of the 
random variables at each iteration is given by E {,)] = cdr^b ~ (1 + 

drb)4>r,b where Srb is the Kronecker delta (1 if r = &, 0 otherwise). The terms 
subtracted are due to the change in types of the symmetric pair of vertex being 
coloured; in the case r = b, two vertices of type (r, r) are lost. 

As done in [6] for the case d = 4, we may express the above expected in- 
crements as a set of differential equations, where each E \A{Zrfi)] is expressed 
as the differential (all as functions of the number t of iterations). We scale 
both time and the variables by dividing by n, and denote Zr^b/n by t/n 
by X and W/n by w. Then the equations are z'^b~ {^d+i,r+b{Zr,b-i + -Zr-i.b) ~ 

2ad,r+bZr,b)^-{^ + 5rb)dr,b, where W = w{x) = Y.r+b<d^d,r+bZr,b, 9r,b = 6r,b{x) 
represents 4>r,b{t/n) and can be defined as before but with Zr^b replaced by Zr,b(x) 
(and the same goes for Jr,h), and C = C(x) = X^(r',b')eB(9 ~ x' — b')9r',b' (after 
manipulation of the equations above). 

The increase in the size of the bisection due to a vertex of type (r, b) being 
coloured red is r, and the symmetric vertex being coloured blue, and of type 
(6, r), also increases the bisection by r. Thus, the expected increase per algorithm 
step is 4>r,b- Letting z denote the bisection size (divided by n), this 

suggests the equation z' = ‘^^(^r,b)eB'’' ^r,b- 

These are the differential equations for a phase with (rg, bo) being the basic 
type. The phase will end when either 0ro,bo = ^ which case, the basic type will 
have priority prio(ro, &o) + l) or when Zr^^bo begins to go negative (in which case, 
the basic type (r, b) in the next phase will be whichever type has highest priority 
among those with > 0). There is the possibility of a phase of zero length, 
if these criteria immediately apply to the new basic type; then the next basic 
type can be determined using the same rule. The variables’ initial conditions at 
the start of a phase are just their values at the end of the previous phase. The 
whole calculation begins with basic type (0, 1) and with all variables equal to 
0, except for Zo,o = 1- The size of the bisection is represented by the value of z 
(scaled up by a factor n) when the values of all the Zr,6 reach 0 simultaneously. 
The last few phases are those in which the basic type (ro, bo) has tq -I- 6q = d, 
and in practice, these may be skipped if the appropriate quantity is added to z. 

After trying many different priority lists, solving the resulting system of 
differential equations using a Runge-Kutta method of order 2, we focused on 
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Table 1. Results for lists A and B, rounded up 



d 


5 


6 


7 


8 


9 


10 


11 


12 


list A 


0.5028 


0.6675 


0.8502 


1.0391 


1.2317 


1.4278 


1.624 


1.823 


list B 


0.5247 


0.6674 


0.8590 


1.0386 


1.2318 


1.4278 


1.624 


1.823 


min(A,B) 


0.5028 


0.6674 


0.8502 


1.0386 


1.2317 


1.4278 


1.624 


1.823 



two priority lists which appear to give the best results. The following determine 
the order amongst those (r, b) with r + b < d: 

List A: prio(t,j) > pT±o{k,l) iSj — i<l — kor{j — i = l — k and i > k). 
List B: Same as List A but swapping prio(0,2) with prio([d/2j — 1, [d/2j). 

For example, with d = 5, List A places the types in the following order: (0,0), 
(1,1), (2,2), (0,1), (1,2), (0,2), (1,3), (0,3), (0,4); List B puts (0,2) before (1,2) 
but retains all other relative rankings. 

List A appears, from the results of the calculations, to perform better for d 
odd and List B performs better for d even. However, for larger d, this is not clearly 
demonstrated to the accuracy with which we can confidently quote the results 
(due to errors inherent in numerical solution of the differential equations) . The 
bounds obtained with Lists A and B for d < 12 are given in Table 1. Machine 
power was too limiting to go much further than this with sufficient accuracy, 
but the further digits obtained, which we do not report here, suggested that the 
ranking of Lists A and B according to the parity of d continues, at least up to 
d= 12. 

As mentioned above, this argument is not rigorous, and in particular the con- 
cept of the probability measured by 4>r^b was not well specified. In the remainder 
of this section we sketch how the results could be turned into rigorous upper 
bounds using the type of argument in [6]. 

A major complicating factor is that the rate of change of variables is not 
smooth: when a vertex of one type is coloured, the effects are different from 
that of another type being processed. The values of the (j>'s can be estimated as 
in [17] by breaking the process up into pieces called clutches in [8], separated by 
the steps in which pairs of basic type are processed. Alternatively, the number 
of pairs of each type in a clutch can be estimated and an expression found for 
4>r,d- The heuristic argument above can be justified in a more direct way by 
considering a different, deprioritized algorithm as in [20]. At the beginning of 
the deprioritized algorithm, the first en steps each randomly choose a pair of 
type (0,0) and applies Maj. This produces a plentiful supply of vertices of all 
types. Then, during the algorithm proper, the type of symmetric pair to be 
coloured in a given step is chosen randomly with probability (j)r^b for type (r, b), 
where (j) is calculated using equations (1). The expected changes in the values of 
the Zij in one step can then be calculated (asymptotically), given their values 
at the beginning of the step. The differential equations we have derived above 
apply, but with the slightly different initial conditions determined by e. When 
the variable Zr^^bo corresponding to the basic type of symmetric pair reaches 
0, or the corresponding 6>ro,bo reaches 0, the phase ends. Inductively, one may 
apply the differential equation method (see [19]) to show that the variables of the 
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Table 2. Size of the bisection obtained by the greedy algorithm for five graphs with 
n = 10® (e5-*) and two graphs with n = 2 x 10® (2e5-*). The asymptotic almost sure 
upper bound from the differential equation analysis is given in the left column. 




process with probability 1 + o(l) follow the solution of the differential equations 
in each phase; that is, Zrfi{t) = nzr^b{t/n) + o{n) for all relevant t. Finally, it 
can be shown that letting e — >■ 0, the differential equation solution trajectory for 
the deprioritized algorithm approaches arbitrarily close to the trajectory for the 
original algorithm. The desired result follows from this. 

4 The Experimental Upper Bounds 

We have also generated a set of d-regular graphs for each d = 5 to 12, following 
the method described in [16]. We repeated the algorithm 10 times on each of 
the graphs, with priorities given by List A for d odd and List B for d even. The 
results, for five graphs with 10® vertices and two graphs with 2 x 10® vertices, 
for each d = 5, . . . , 12 are summarised in Table 2. The left column of the table 
includes the bound via the differential equations. The mean, max and min of 
the bisection values obtained using Algorithm 1 are given for each graph, and 
the means are also averaged over all graphs of each of the two sizes. For any 
reader interested in checking the experiment, the graphs generated can be found 
at: http://www. Isi. upc. es/~mjserna/ dregraphs.html 

5 Maximum Bisection 

Let us consider the variation of the priority-greedy algorithm obtained replacing 
the majority operation (Maj) with the minority operation, that assigns to a ver- 
tex the colour minority among its coloured neighbours. Let us call this variation 
max priority-greedy 

Define an edge to be fully coloured when both its ends are finally coloured. 
A fully coloured edge is mono-coloured if both ends have the same colour and 
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hicoloured if both ends have different colour. So the edges mono-coloured by 
priority-greedy get bicoloured by max priority-greedy and vice versa, whenever the 
vertices of the graph are treated in the same order (which happens with the 
same probability, in both cases). That is, every edge that counts in the bisection 
for one algorithm does not count in the other and vice versa. So, taking into 
account that the total number of edges in a d-regular graph is dn/2, we have 
the following complementary bounds for the maximum bisection: the size of the 
maximum bisection in a random d-regular graph is a.a.s. at least dn/2 — c^n, 
where Cd is the min value given in Table 1 in column d. 

6 Conclusions and Open Problems 

In this paper we have proposed a randomized greedy procedure which bounds 
the bisection width of any d-regular graph, and analyzed its typical performance 
on random d-regular graphs. The algorithm uses a predefined list of priorities. 
Furthermore, a related algorithm shows complementary bounds for the maxi- 
mum bisection size. We sketch a proof that for any given list, and any d > 3, the 
values of the size of the bisection obtained by the algorithm are concentrated 
around the value determined by the solution of a set of differential equations. 

Experimentally, we notice that a good list of priorities is given by List A for 
d > 6 even and List B for d > 5 odd. It remains an open problem to search 
for other possible lists of priorities that improve the outcome of the algorithm. 
In Table 1, we get the asymptotic bisection width as solution to the differential 
equations (the tables refiect the constant to be multiplied by n). We may compare 
with the asymptotic lower bound of Bollobas and the asymptotic value of the 
upper bound of Alon, which is a deterministic one. For instance, for d = 5, Bol- 
lobas’ lower bound yields 0.31917n, Alon’s upper bound yields 1.15118n, while 
our upper bound is 0.5028n. Furthermore, the complementary lower bounds we 
get on max bisection are well above the known lower bounds for all d-regular 
graphs, triangle-free or not. 

Moreover, as can bee seen from Table 2, even for rather small values of n, the 
size of the bisection obtained by the algorithm is close to the solution determined 
by the differential equations. As n grows, this phenomenon strengthens. 

As mentioned above, in [6] there is an analytic expression for the bound 
obtained on the bisection width of a random 4-regular graph, obtained from 
differential equations corresponding to the priority-greedy algorithm. When run 
for d = 4, Lists A and B give the same theoretical result, because the types 
(0,2) and (1,2) never become basic: there is only one phase, with (0,1) basic 
(see [6], where the algorithm is expressed in a different way but gives the same 
differential equations). 

Several open problems remain, the first being to improve the upper bound. 
One way to do this may be to find better priority lists. Another question is 
whether there is some simpler way to analyse these greedy algorithms rigorously. 
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Abstract. We consider the problem of partitioning n integers into two 
subsets of given cardinalities such that the discrepancy, the absolute 
value of the difference of their sums, is minimized. The integers are 
i.i.d. random variables chosen uniformly from the set We 

study how the typical behavior of the optimal partition depends on n, M 
and the bias s, the difference between the cardinalities of the two subsets 
in the partition. In particular, we rigorously establish this typical behav- 
ior as a function of the two parameters k := n~^ logj M and b := |s|/n 
by proving the existence of three distinct “phases” in the /tb-plane, char- 
acterized by the value of the discrepancy and the number of optimal 
solutions: a “perfect phase” with exponentially many optimal solutions 
with discrepancy 0 or 1; a “hard phase” with minimal discrepancy of 
order and a “sorted phase” with an unique optimal partition 

of order Mn, obtained by putting the (s -I- n)/2 smallest integers in one 
subset. 



1 Introduction 

Phase transitions in random combinatorial problems have been the subject of 
much recent attention. The random optimum partitioning problem is the only 
NP-hard problem for which the existence of a sharp phase transition has been 
rigorously established, as have many detailed properties of the transition ([2], 
see [3] for a short overview) . Here we study a constrained version of the random 
optimum partitioning problem, and extend some of the results of [2] to that case. 
Complete proofs of the results announced here will be given in [1] . 

The integer optimum partitioning problem is a classic problem of combina- 
torial optimization which has been studied in the theoretical computer science 
([9], [4], [10], [11]), artificial intelligence ([8]), theoretical physics ([7], [5], [6], 
[12], [13]) and mathematics ([3], [2]) communities. The problem is to partition a 
given set of n integers into two subsets in order to minimize the absolute value of 
the difference between the sum of the integers in the two subsets, the so-called 
discrepancy. Notice that for any given set of integers, the discrepancies of all 
partitions have the same parity, namely that of the sum of the n integers. We 
call a partition perfect if its discrepancy is 0, when this sum is even, or 1, when 
this sum is odd. The decision question is whether there exists a perfect partition. 
In the uniformly random version, an instance is a given a set of n i.i.d. integers 
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drawn uniformly at random from {1,2,. We will sometimes use the no- 
tation m = log 2 M ; notice that each of the random integers has m binary bits. 
Previous work had established a sharp transition as a function of the parameter 
K := m/n, characterized by a dramatic change in the probability of a perfect 
partition. For M and n tending to infinity in the limiting ratio k = m/n, the 
probability of a perfect partition tends to 1 for k < 1, while the probability tends 
to 0 for K > 1. This result was suggested by the work of one of the authors [12] 
and proved in a paper by the three other authors [2]. 

The location of the phase transition for the unconstrained problem immedi- 
ately yields a one-dimensional phase diagram as a function of k: For n G (0, Kc) 
with Kc = 1, the system is in a “perfect phase” in which the probability of a 
perfect partition tends to 1 as M and n tend to infinity in the fixed function k. 
For K G {kc, oo), the probability of a perfect partition tends to 0, and moreover, 
there is an unique optimal partition. We call this the “hard phase,” since for 
K, > Kc, it is presumably computationally difficult to find the optimal partition. 

Here we consider a constrained variant of the problem in which we require 
that the two subsets have given cardinalities; we say that the difference of the 
two cardinalities is the bias, s, of the partition. We establish the two-dimensional 
phase diagram of the random constrained integer partitioning problem as a func- 
tion of the parameters n := m/n and h := \s\/n. See Fig. 1. In addition to the 
extensions of the perfect and hard phases, we establish the existence of a new 
phase which we call the “sorted phase.” 




Fig. 1. Phase diagram of the constrained integer partitioning problem. 



The sorted phase is easy to understand. One way to meet the bias constraint 
is to take the (s -I- n)/2 smallest integers and put them in one subset of the 
partition. We define the sorted phase as the subset of the K6-plane where the 
sorted partition is optimal. We prove that the sorted phase is given by the 
condition b > be ■= — 1; see region I in Fig. 1. Moreover, we show that the 

minimal discrepancy in this phase is of the order Mn. 
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Our analysis of the perfect and hard phases for b < be is much more difficult. 
In this region, we use integral representations for the number of partitions with a 
given discrepancy and bias; these representations generalize those used in [2]. The 
asymptotic analysis of the resulting two-dimensional random integrals leads to 
saddle point equations for a saddle point described in terms two real parameters 
rj and For discrepancies of order o{M) (including, in particular, the case of 
perfect partitions), the saddle point equations determining C and rj are: 

1 1 

J X tanh((Ca; + rfjdx = J tanh((Ca: + rj) dx = —b. (1) 

0 0 

The solution (C, rj) of these equations can be used to define the two convex curves 
in Fig. 1. To this end, let 




tanh(C -I- 77 ) — tanh(? 7 ) 

PiC,v) :=1 . 

For (C,? 7 ) a solution of (1), we then define 

k_( 6) := -log2p(C,?7), Kc(b) := ^L{C,v) 

From bottom to top, the two convex curves joining (0,6c) and (1,0) in Fig. 1 
are then given by k = k_(6) and k, = Kc{b). 

Our results prove that, in the region n < k_( 6), with probability tending 
to one as n tends to infinity (or, more succinctly, with high probability, w.h.p.) 
there exist perfect partitions; see region III in Fig. 1. Moreover the number of 
perfect partitions is about in this “perfect phase.” We also prove that 

w.h.p. there are no perfect partitions in the region b < be and k > Kc( 6 ), which we 
call this the “hard phase.” Our results leave open the question of what happens 
in the narrow region n_ < k < Ke, and also whether the optimal partition is 
unique in the hard phase. 

We also prove that these phase transitions correspond to qualitative changes 
in the solution space of the associated linear programming problem (LPP). In 
the actual optimum partitioning problem, each integer is put in one subset or the 
other. The relaxed version is defined by allowing any fraction of each integer to 
be put in either of the two partitions. Here we show the following. In the sorted 
phase, i.e. for b > be = ^/2— 1, w.h.p. the LPP has a unique solution given by the 
sorted partition itself. For 6 < be, i.e. in the perfect and hard phases, w.h.p. the 
relaxed minimum discrepancy is zero, and the total number of optimal basis 
solutions is exponentially large, of order ^ ). Finally, in the perfect 

and hard phases, we consider the fraction of these basis solutions whose integer- 
valued components form an optimal integer partition of the subproblem with the 



(3) 

(4) 
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corresponding subset of the weights. We show that this fraction is exponentially 
small. Moreover, except for the crescent-shaped region between k = K-{b) and 
K = Kc{b), we show that the fraction is strictly exponentially smaller in the hard 
phase than in the perfect phase. This fraction thus represents some measure of 
the algorithmic difficulty of the problem. 

In the next section, we define the problem in detail, and precisely state our 
main results. In Section 3, we introduce our integral representation and show 
how it leads to the relevant saddle point equations. We also give a brief heuris- 
tic derivation of some of the phase boundaries. The complete proofs are quite 
involved, and are presented in the full paper version [1] of this extended abstract. 

2 Statement of Main Results 

Let Xi, . . . , Xn be n i.i.d. random variables distributed uniformly on {1, . . . , M}. 
We use P and E to denote the corresponding probability and expectation induced 
by X = {Xi, . . . , Xn). We are interested in the case when M grows exponentially 
with n, and define k as the exponential rate, i.e. k, = n~^ log 2 M . To avoid trivial 
counterexamples, we assume that n stay bounded away from 0 and oo as n — >■ oo. 

A partition of integers into two disjoint subsets is coded by an n-long binary 
sequence a = (cti, . . . , (j„), aj G { — 1,1}; so the subsets are {j : aj = 1} and 
{j : (7j = —1}. Obviously cr and — cr are the codes of the same partition. Given 
a partition cr, we define its discrepancy, d(X.,a) = \a ■ X|, and bias, s(a) = 
a ■ e = \{j : aj = 1|| — |{j : (Jj = — 1}|. Here cr • X = e is the 

vector (1, . . . , 1). Clearly s(a) is an integer in {— n, . . . , n}, so let s G {— n , . . . ,n} 
and define the bias density b = |s|/n so that b G [0, 1]. Note that by symmetry 
it suffices to consider s{a) G (0, ...,n}, so we will often take a non-negative 
integer s G {0, . . . , n|, in which case s = bn. We define an optimum partition as 
a partition a that minimizes the discrepancy d(X, a) among all the partitions 
with bias equal to s, and a perfect partition as a partition a with |d(X, cr)| < 1. 

Theorems 2, 3 and 4 below describe our main results on the phases labelled 
I, II, and III in Fig. 1 in the introduction. In the statement of these theorems we 
will use the parameters f,r],Kc{b) and K-{b) defined in (1) - (4). Before getting 
to principal results, we begin with an existence statement. 

Theorem 1. Let b < be, where be = — 1- Then the saddle point equations 

(1) have a unique solution {C,,r]) = {(^{b),r]{b)). 

Let Zn{£,.s) = Zn{£, s;X) denote the random number of partitions cr with 
cr • X = f and cr • e = s. Since s(cr) has the same parity as n, and X • cr has 
the same parity as only consider values of s which have the 

same parity as n, and values of t which have the same parity as - 

theorems below, we will not state these restrictions explicitly. 

To formulate our results in a compact form, we use a shorthand a„ < a 
(a„ > a, resp.) instead of limsupa„ < a (liminf a„ > a, resp.), even when the 
n-dependence of a„ is only implicit, as in k = n~^ log 2 M and b = |s/n|. We also 
use the notation /„ = Op{gn) and /„ = Op{hn) if fn/g-a is bounded in probability 
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and fn/hn goes to zero in probability, respectively. Also, as is customary, we will 
say that an event happens with high probability (w.h.p.) if the probability of 
this event approaches 1 as n — >■ oo. In all our statements n, M, s and £ will be 
integers with n > 1, M > 1 and s > 0. 

Our main results in the perfect phase are summarized in the next theorem. 
Theorem 2. Let £ = o{Mn^^'^),b < be and k < K-{b). Then w.h.p. Zn{£, s) > 1 
and 



^ _ 2[«'c(b)-'«]™gS„n^^^+o(n 



p/2'1 



( 5 ) 



where Sn converges in probability to a Gaussian with mean zero and variance 
= Var(log(2 cosh(C[7 + rj))), with U uniformly distributed on [0,1]. Conse- 
quently, w.h.p., there exist exponentially many perfect partitions, with £ = 0 if 
Aj is even, and \£\ = 1 if 

Our next theorem, which describes our main results on the hard phase, has 
two parts: The first shows that there are no perfect partitions above k = Kc{b), 
and the second gives a bound on the number of optimum partition for k > K-. 
To state the theorem, let dopt = dopt{n; s) denote the discrepancy of the optimal 
partition, and let Zopt = Zopt{ri; s) denote the number of optimal partitions. 
Theorem 3. Let b < be. 

1. If K > Kc{b), then there exists a 6 > Q such that with probability 1 — 
Q(g-iiog there are no perfect partitions, and moreover 

dopt > (6) 

2. If K > K-{b) and e > 0, then there exists a constant d > 0 such that 

dopt < and , (7) 

both with probability 1 — 0(6“*^'°®^”). 

Our main result on the sorted phase is the following theorem. 

Theorem 4. Let b > be. Then w.h.p. the optimal partition is uniquely obtained 
by putting (s+n)/2 smallest integers Xj in one part, and the remaining (n — s)/2 
integers into another part. W.h.p., dopt is asymptotic to "^[(1 + — 2], i.e., 

of order Mn. 

By this theorem, for b sufficiently large, the partition is determined by the de- 
creasing order of weights Xj , but not by the actual values of Xj . 

It is a rather common idea to approximate an optimization problem defined 
with integer-valued variables by its relaxed version, where the variables are now 
allowed to assume any value within the real intervals whose endpoints are the 
admissible values of the original integer variables. In our case, the relaxed version 
is a linear programming problem (LPP) which can be stated as follows. Find the 
minimum value dopt of d, subject to linear constraints 

— d < ^ ajXj < d, ^ (7j = s, and — 1 < < 1, (1 < J < n). (8) 

i j 
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As usual, the LPP has at least one basis solution, i.e. a solution {a,dopt), which 
is an extreme (vertex) point of the polyhedron defined by the constraints (8). 
Let N{a) := \{j : aj G (— 1,1)}| be the number of components of cr which are 
non-integer. It is easy for the reader to verify that A^(cr) is either 0 or 2 for all 
basis solutions cr. 

Our next theorem shows that the LPP inherits the phase diagram of the 
optimum partition problem, and moreover provides a limited way to quantify 
the relative algorithmic difficult of the optimal partition problem in the three 
regions. For b > be the solutions of the initial partition problem and of its LPP 
version coincide. For b < be they are very far apart, in terms of the ratio of 
respective optimal discrepancies. To state this precisely, we define Fn{K,b) to be 
the fraction of basis solutions cr with the property that the deletion of the N{a) 
components of cr with values in (—1, 1) produces an optimal integer partition for 
the corresponding subproblem with weights Xi. 

Theorem 5. 1. Ifb> be, then w.h.p. the sorted partition is a unique solution 

of the LPP, and thus d^pt^ = &{Mn) and Fn{n, b) = 1. 

2. If b < be, then w.h.p. d^^^^ = 0. In addition, w.h.p. there are 

basis solutions, each having either none or exactly two components at yf ±1. 

3. If b < be, then w.h.p. F„(/t, 6) = for n < K-{b), and 

2 -[«c(b)-eo(i)]n < Fn{K,b) < 2-[«-W+°(i)l" for k > k_(&). 

Remark 1. (i) If one assume that the number of optimal partitions Zgpt in the 
hard phase grows subexponentially with probability at least 1 — o{n~^) (see [1] 
for a motivation of this assumption), our upper bound on the fraction F„{k, b) in 
the hard phase can be improved to match the lower bound, yielding Fn{n, b) = 
2 -n[Kc{b)+o{i)] hard phase. 

(ii) If, on the other hand, the asymptotics of Theorem 2 hold up to Ke, more 
precisely, if one assumes that for b < be and k < Kc{b), the bound Z„(£,s) = 
2 n[Kc(b)-K+o(i))] -v^ith probability least 1 — o(n“^), then a bound of the form 

F„(k, b) = 2“"[”+°F)1 can be extended to all k < Ke. 

3 Outline of Proof Strategy 

3.1 Sorted Partitions 

We first discuss our strategy to prove that in region I, the optimal partition is 
sorted and has discrepancy of order Mn. Let us first recall that M is assumed to 
grow exponentially with n, so that, in particular, = o{M). As a consequence, 
w.h.p. no two weights are equal, and there is a unique reordering of the weights 
Xi, . . . , Xn such that their sizes are increasing, < A.„.( 2 ) < ••• < 

where 7r(l), . . . , 7r(n) is a suitable permutation of 1, . . . , n. 

Given a bias s > 0, we need to find an optimum partition that puts fc = (s -I- 
n)/2 integers in one part, and the remaining n—k integers into another part. One 
such feasible partition is obtained if we select the k smallest integers for the first 
part; we call it the sorted partition. It is coded by the cr, with cr^(i) = I for z < fc 
and cr^(j) = —I for i > k.lf the total weight of (n — k) largest weights is, at most. 
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the total weight of k smallest weights, then it is intuitively clear that the sorted 
partition is optimal. More precisely: if (5g(X) = ~X)j=fc+i ^ 

then the sorted partition is the unique, optimal partition and dopt = 6s{X). 

To establish the boundary of the sorted phase, we thus have to determine the 
region of the phase diagram in which w.h.p. <5s(X) > 0. Rather than considering 
a sorted partition with fixed bias s = bn, let us instead consider a random sorted 
partition with aj = 1 for Xj < Mq, and aj = —1 for Xj > Mq, where Mq G 
M} is chosen so that the expected bias density is b, i.e. Mq = M{b+ l)/2. 
The expected discrepancy of such a partition is 




O(M-i) Mn. 



(9) 



The right hand side of (9) is positive and of order Mn iff (6 + 1)^/4 — 1/2 > 
0, or equivalently b > be = \/2 — 1. In Sect. 6 of [1], we prove the required 
concentration, implying that the condition b > bcis both necessary and sufficient 
for (5s (X) to be, w.h.p., positive and of order Mn. 



3.2 Integral Representations 

Let us now turn to the much more difficult region b < be- Guided by the results 
of [2], one might hope to prove that, as the parameter n = n~^ log 2 M is varied, 
the model undergoes a phase transition between a region with exponentially 
many perfect partitions and a region with no perfect partitions. 

A starting point in [2] was an integral (Fourier-inversion) type formula for 
Zn{() = Zn{£; X), the total number of ct’s such that cr • X = £, namely 

Zn{£) = / cos(£x) COs(xAj) (ix. (10) 

(ce(-7r/2,7r/2] 

We need to derive a two-dimensional counterpart of this formula for Z„ (I, s) . To 
this end, let us first note that s = 2\{j ■. <jj = 1}\ — n, so that a generic value s 
of s(cr) must meet the condition n-l- s = 0(mod 2). In a similar way, we get that 
(7 • X has the same parity as the sum Xj . Keeping this in mind, we have that 
on the event Xj = £(mod 2)}, for n -I- s = 0(mod 2), 

I(cr-X = £, (7-e = s) = ^ jj dxdy, (11) 

X,y^{ — 7T/2,7T/2] 

thus extending (4.6) in [2]. Multiplying both sides of the identity by 2", and 
summing over all a, we obtain 

on p p 

Zn{£, s) = cos{xXj + y) dxdy 

= 2 "Pi/ 2 (ct-X = £, CT-e = s|x), 



(12) 
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where cr = (cti, . . . , cr„) is a sequence of i.i.d. Bernoulli random variables with 
probability of CTj = ±1 equal to 1 / 2 . 

We would like to estimate the asymptotics of the integral in (12), which is 
equivalent to proving a local limit theorem for the conditional probability in (12). 
In general, to compute — via local limit theorems — the probability that some 
random variable A takes the value a, it must be the case that the corresponding 
expectation of A is near a. Thus the analogue of the representation (12) for the 
unconstrained problem was well adapted to the analysis of perfect partitions. 
Indeed, in that case, we wanted to estimate Pi/ 2 (|o' • X| < 1 |X), and we had 
Ei/ 2 (ct • X|X) = 0. However, in the constrained case, this strategy cannot be 
expected to work for b > 0, since s = bn is very far from the expectation of u • e, 
namely Ei/ 2 (o' • e|X) = 0. 

To resolve this difficulty, we introduce a two-parameter family of distributions 
for (jj as follows: Given ^,r] gR, let a = (cti, . . . , cr„) be a sequence of random 
variables such that, conditioned on X, cti, . . . , ct„ are mutually independent, and 

P(a, = l|X) = P(eX,+r;) with P(u):=^^^. (13) 

It is not hard to show [1] that, in terms of these random variables, Zn{£,s) can 
be rewritten as 



Z^{£, s) = p(cr • X = £, cr • e = s|X) 

1 



= gnL„(4.r,;X) 






(14) 



'7r/2,7r/2] 



where 



:= ^ + ^ + - V log(2cosh(^X^ + 77 )). (15) 

n n n ^ ' 

1=1 

3.3 Saddle Point Equations and Their Solution 

Given ^, 77 , we now face the problem of determining an asymptotic value of the 
local probability in (14). This will obviously be easier if the chosen parameters £ 
and s are among the more likely values of cr • X and a ■ e, respectively. A natural 
choice is to take £ and s equal to their expected values, that is E(cr • X|X) = £ 
and E(ct • e|X) = s, or explicitly, 

n n 

Xj tanh(^Xj + 77) = —£, y^tanh(^Xj + 77) = —s. ( 16 ) 

1=1 1=1 

Note that the equations (16) also arise naturally in an apparently different ap- 
proach to estimate the integral in (12), the “method of steepest descent.” In 
our context, this corresponds to a complex shift of the integration path, i.e., to 
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changing the path of integration for x to the complex path from — tt/ 2 + to 

— 7r/2 + i^, and the path of integration for y to the complex path from — tt/2 + 177 
to — 7 t/ 2 + 777, where ^ and y are determined by a suitable saddle point condition. 
For general ^ and rj, this leads to (14), while the saddle point conditions turn 
out to be nothing but (16). 

Both approaches raise the question of uniqueness and existence of a solution 
to the saddle point equations (16). While uniqueness follows from an abstract 
convexity argument which holds independent of the value of the parameters, 
existence turns out to be much more difficult and requires that s is smaller than 
some critical value Sc = Sc(X). In the actual proof, we modify this approach a 
little since the solution ^ = ^(X), y = y{X.) does not lend itself to a rigorous 
analysis of P(ct • X = £,<t • e = s|X). Instead, we will resort to “suboptimal” 
^ = C/M^ where C, y are nonrandom constants, and (^, y) is a solution of non- 
random equations, obtained by replacing the (scaled) sums in (16) with their 
weak-law limits, and the parameters ^ and s by their scaled version l/Mn and 
b = s/n. Since we are mainly interested in ^ = 0,±1 (corresponding to perfect 
partitions) with ijMn = o(l), this leads to the equations (1) given in the in- 
troduction. As shown in Sect. 4 of [1], these equations have a (unique) solution 
( = ((b) , y = y{b) iS b < be = -\/2 — 1, the same be that determines the sorted 
phase. 



3.4 Asymptotic Behavior of .Z„(£, s). 

Assuming that b < be, let us now consider the right hand side of (14) with (C, y) 
taken to be the solution of (1). Then we can try to prove a local limit theorem 
for the conditional probability in (14), giving the approximation 



P((T-X = ^,cr-e = s|X) 



2 

TTy/det Q’ 



where 

Var(cr-A) cov((t • X, cr • e)\ 
^ \^cov(ct • X, (t • e) Var(a • e) J ' 



(17) 



(18) 



Here the (co)variances are conditioned on X, so, e.g., Qn = Var(cr • X|X). 
Next we appeal to the weak law of large numbers to further approximate the 
matrix elements of Q by Qn ~ n^M'^Rn, Q 12 ~ n^MRi2, Q21 ~ and 

Q22 ~ n^i?22j where i? is a deterministic matrix depending on (^,y. This gives 



P(fT-X = £,cr-e = s|X) 



1 2 
nM TTy/det R 



(19) 



In a similar way we approximate the exponent L„(^(X), t 7 (X); X) in (14) by its 
weak limit, which is just the function L{C,,y) introduced in (2). For \^\ < 1 and 
M = 2””, we thus approximate log 2 ^n(l?, s) by n{L{(,y) — k) = n{Kc{b) — k), 
suggesting that for n < Kc{b) there are exponentially many perfect partitions, 
while for n > Ke{b) there are none. 
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However, this informal argument is too naive, and neglects several important 
error terms. Indeed, the above approximation for log 2 s) could not possibly 
hold for K > Kc{b) since, s) is an integer, and thus cannot be asymptotically 
equivalent to an exponentially small, yet positive number. This means that a 
rigorous proof must be based on the condition k < Kc{b). In fact, we will need 
the stronger condition k < K-(6); see [1] for the (quite painful) details. 
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Abstract. Consider a game in which edges of a graph are provided a 
pair at a time, and the player selects one edge from each pair, attempting 
to construct a graph with a component as large as possible. This game 
is in the spirit of recent papers on avoiding a giant component, but here 
we embrace it. 

We analyze this game in the offline and online setting, for arbitrary 
and random instances, which provides for interesting comparisons. For 
arbitrary instances, we find a large lower bound on the competitive ratio. 
For some random instances we find a similar lower bound holds with 
high probability (whp). If the instance has |(l + e)n random edge pairs, 
when 0 < e < 0.003 then any online algorithm generates a component 
of size 0((logn)®/^) whp, while the optimal offline solution contains a 
component of size f?(n) whp. For other random instances we find the 
average-case competitive ratio is much better than the worst-case bound. 
If the instance has |(1 — e)n random edge pairs, with 0 < e < 0.015, we 
give an online algorithm which finds a component of size Q{n) whp. 



1 Introduction 

A pair of recent papers [BF01,BFW02] analyze the “Achlioptas process”, where 
a collection of random edge pairs is given a pair at a time, and the object is 
to select one edge from each pair to avoid having a (suitably defined) giant 
component in the resulting graph. Without any intelligent selection process, 
a giant component forms after about edges; [BFOl] shows that a strategy 
exists which accepts at least 0.535n edges without forming a giant component; 
[BFW02] shows (among other things) that no more than about 0.964 edges may 
be accepted. 

It is equally natural to ask the opposite question. 

What can you do to encourage a random graph to form 
a giant component, using fewer than (1 -|- e)n/2 edges? 

In fact, it is so natural we learned that Bohman and Kravitz are studying it 
independently [BK03]. 
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We now define the problem of Embracing the Giant Component (EGG) more 
precisely. An instance I consists of a sequence of m pairs of edges on n vertices. 
(If you like, I may be regarded as an element of [( 2 )] .) Edges, including those 

in a pair, may or may not be distinct. A solution is a choice of one edge from each 
pair, and its value is the order (number of vertices) in the largest component 
in the graph consisting of the chosen edges. EGC(/) is the maximum value of a 
solution for instance I. 

We focus on online versions of EGG, in which we see the pairs one at a time 
and must select our edge before seeing the next pair, but we also consider offline 
versions, in which we see all m pairs before making our choice. In either case, 
we consider edge pairs chosen randomly (defining an average-case behavior) or 
arbitrarily (chosen adversarially). 

In addition to being a natural graph-game problem, EGG has two other 
sources of interest. First, imagine that you are a company trying to build up a 
network of some sort, each new link you build must be in response to a customer 
demand, and your budget allows you to spend at a rate which satisfies only 
half of all new requests. Presuming that a large connected component in the 
network is beneficial to your customers and to you, your goal is to solve an 
optimization problem very similar to EGG. Of course any real-world problem 
would be much more complicated, with different costs and benefits for different 
links, the ability to wait longer or shorter times to see more choices, and so forth, 
but it is conceivable that there are real-world problems whose mathematical core 
is captured by EGG. 

The second motivation is that EGG provides an example of a problem for 
which the competitive ratio is awful in the worst case, but, for certain parameters, 
quite reasonable in an average case; a previous example was given by [SSS02] . For 
certain other parameters, EGG has a lower bound on average-case competitive 
ratio that is almost as awful as in the worst case. 



1.1 Worst Case 

We first observe that in the worst case, it is hard to solve offline EGG exactly 
(to select edges giving a component as large as possible), or even to approximate 
it to better than some fixed factor. 

Theorem 1. Offline EGC is MAX SNP-hard. 

In the online setting, it is natural to measure performance in terms of the 
competitive ratio, the ratio Zopt / .Soniine between the sizes of the components pro- 
duced by the best possible offline and online algorithms. The next theorem shows 
that in the worst case, the competitive ratio is as bad as it conceivably could be. 

Theorem 2. The worst-case competitive ratio for EGC is (m -I- l)/2. Specifi- 
cally, for every online algorithm, there is a sequence of m edge pairs for which 
the algorithm produces a collection of isolated edges, yet the optimal solution has 
a component on m-\- 1 vertices. 
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As we remark after the proof of this theorem, a competitive-ratio lower bound 
of l7(n/logn) holds even for randomized online algorithms against an oblivious 
adversary. 

1.2 Average Case 

We define In,m to be a random instance of EGC in which each edge of each pair 
is chosen independently, uniformly at random from the edge set of the complete 
graph Kn- 

Our main intention is to compare the average-case competitive ratio with 
the worst-case lower bound in Theorem 2. To do so, we need some idea of the 
optimal offline value of EGC(/„^m)- We will see that these random instances 
exhibit a sharp threshold in objective value at m = |n, which we will prove by 
analyzing a greedy heuristic for offline EGG. 

Throughout the paper, we will rely on a “component-identification algo- 
rithm” . This algorithm, and our method of analysis, is quite standard in the 
random-graph literature; see for example the giant-component chapter of Ran- 
dom Graphs [JLROO, pp. 108-111]. 

Our component-identification algorithm. Algorithm A, maintains two set of 
vertices, called unborn, Ui, and alive, Ai. Initially, a single vertex is alive, Ai = 
{ui}, and the remainder are unborn, U\ = [n] \ {ui}. At step i, we look at 
all the neighbors of some vertex Vi G Ai. We kill Vi and give birth to all its 
unborn neighbors (formally, let Pi = UiC\ N{vi) be the progeny of Vi, and set 
Ai+i = Ai\ {uj U Pj and Ui+i = Ui\ Pi). 

Our greedy heuristic is very similar to Algorithm A. Roughly, we try starting 
at each vertex, and using the first edge we see from each pair. We will elaborate 
on this description in the proof of Theorem 3. 

Theorem 3. For any fixed e > 0, for m = |(1 — e)n, we have EGG(/„_m) = 
O(logn) while for m = j(l + e)n, our greedy heuristic finds a solution showing 
EGG(/„,m) = f2(n) whp. 

The below-the-threshold half of the theorem follows from well-known results in 
the theory of random graphs, since the union of all the edges in all the pairs is 
a random graph with |(1 — e)n edges, which is below the threshold for a giant 
component (see, for example, [JLROO]). 

It is interesting to note that below the threshold, the largest component in 
the union of the edges contains at most one edge from each pair whp, so for 
m= |:(1 — e)n, we can solve EGG(/„,m) optimally whp. 

The above-the-threshold half of the theorem is proved in Section 3, in a 
manner similar to the analysis of the giant component above the threshold in 
Gn,p- 

Our next theorem shows that even in the average case, any online algorithm 
performs much worse than offline. 

Theorem 4. For e < 0.003, on instances In,m with m = j(l-|-e)n, every online 
algorithm finds a component of size only 0((logn)^/^) whp. 




72 



A. Flaxman, D. Gamarnik, and G.B. Sorkin 



Theorem 3 and 4 together give a lower bound on the average-case competitive 
ratio for EGG: the ratio of offline solution to online is l7(n/(log whp. This 

shows that the lower bound on competitive ratio for EGG is more robust than 
Theorem 2 alone indicates. 

Theorem 4 is only true for some range of e, however. For example, if e > 1, 
then taking the first edge from each pair yields a random graph above the giant 
component threshold, and so this trivial algorithm has a constant competitive 
ratio. We go slightly beyond the trivial bound in the next theorem. 

Gonsider Algorithm C, which does the following: for some 7 to be determined 
later, for the first yn choices we take the first edge of each pair. For the remaining 
m — jn choices, we take the first edge unless it touches an isolated vertex, in 
which case we take the second edge. 

Theorem 5. For e < 0.015, on instances In,m with m = ^ (1 — e)n, Algorithm C 
yields a component of size Q{n) whp. 

2 Proofs of Worst-Case Theorems 

Proof of Theorem 1: To show the hardness of approximating EGG, we re- 
duce from MAX 3SAT-5. MAX 3SAT-5 is a structured relative of MAX 3-SAT, 
introduced by Feige, where every variable appears in exactly 5 clauses and a 
variable does not appear in a clause more than once. Feige proves that there is 
some e > 0 for which it is NP-hard to distinguish a satisfiable instance from an 
instance with at most (1 — e)m satisfiable clauses [Fei98]. 

Given a MAX 3SAT-5 instance, we make a EGG instance with n-|-3m-|-l ver- 
tices by including a vertex for each literal, £, 3 vertices for each clause Gi, G2, Cs, 
and an additional “root” vertex, r. We model the assignment by n edge pairs 
which decide if each variable is true or false: let pair i be ({r, Xj}, {r, ay}). We 
include 3m additional pairs: for each clause C, let £ be the j-th literal in C 
(where j G {1, 2, 3}), and include a pair of the form {{£, Cj}, 3)}). 

If the assignment is satisfiable, there is a way of selecting edges which yields a 
component of size 3m -|- n -I- 1. On the other hand, any selection of edges from 
the first n pairs corresponds naturally to some assignment. If a literal is not 
selected, then since it appears in at most 5 clauses and is not connected to the 
root, it can be in a component of size at most 16. Since it is NP-hard to distin- 
guish satisfiable 3SAT-5 instances from instances with at most (1 — e)m clauses 
satisfiable, it is also NP-hard to distinguish instances of EGG with a component 
of size |m -I- 1 from those with a component of size at most (| — e)m-|-l. □ 

Proof of Theorem 2: We will present a sequence of edge pairs, depending 
on the previous choices of the algorithm. The edge pairs will all come from a 
complete binary tree, and the edges in each pair will be siblings, i.e., of the form 
{{x,y},{x,y'}). Whatever the algorithm chooses at step i — and for a fixed 
deterministic algorithm this choice is predictable — we make it wish it chose 
otherwise. So, if the algorithm selects edge {x,y}, the next pair we give it is 
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Thus, the online algorithm obtains a graph with only isolated edges, while 
making the opposite choice at every step would yield a component with m edges. 

□ 

Of course, the same (m + l)/2 ratio also applies to randomized online algo- 
rithm, if the adversary is allowed to see the algorithm’s choice before constructing 
the next pair. Even if the adversary is required to fix a sequence of pairs in ad- 
vance, and even if she does not know what randomized algorithm is being used, 
there is an almost equally bad instance. It is given by a random path down the 
tree and the siblings of the path edges, each edge paired with its sibling, the 
pairs presented in order from root to leaf. At each step, the online algorithm has 
probability only 1/2 of choosing the path edge rather than its sibling, and hence 
whp gets a largest component of size only O(logn). 



3 Proofs of Average-Case Theorems 



Proof of Theorem 3: We repeat more formally the greedy heuristic sketched 
in the introduction, in a form conducive to analysis. Algorithm B repeats the 
following n times, starting with each possible vertex for vi. At each step, we 
maintain two sets of vertices, called unhorn, Uf, and alive, Ai. Initially, a single 
vertex is alive, A\ = {ui}, and the remainder are unborn, Ui = [n]\{ui}. At step 
i, we choose some vertex Vi G Ai and identify all previously unidentified pairs 
with an edge incident to Vj. For each such edge pair, we use the edge incident to 
Vi- We let Pi = N{vi) fl Ui denote the set of newly discovered vertices, and we 
set Ai+i = {Ai \ {ui}) U Pj and Ui+i = Ui\ P*. 

For analysis, it is convenient to work with an instance resembling Gn,p- Let 
In,p be a random instance formed by including each pair of edges independently 

with probability p. Thus, our probability space is {0, l}^^) with the product 
measure. We will show that the threshold value is p = which has expected j 
pairs. We do so by analyzing the behavior of Algorithm B on In,{i+e)n-^j which 
proceeds in two claims. We will then translate this result to random instances 
In,m where m = 3;(1 + e)n. 

Let p = (1 -I- e)n~^, and let /3, <5 > 0 be such that (1 -I- e)(l — /3)^ = 1-1-5 and 
let to = 8(1 -I- 5)5“^ log n and ti = (3n. 

Claim 1: Running Algorithm B on with any starting vertex v\, either 
the algorithm halts before step to or for all to < t < ti we have \At\ > 1 whp. 

For this it is sufficient, for each t with to < t < ti, to identify t vertices of the 
component. Before we have identified a size (3n component, there are at least 
(1 — (})n unborn vertices. So there are at least 2(n(l — /?)) ~ (n(l — (3))^ 

candidate edge pairs which contribute a unique vertex to Pi. Thus we have 



. t 

E 

- i=l 



P,.\<t 



< 



^B((n(l -/3))^p) < t 



■ Z=1 
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We also have 

r * 1 

E ^B(n3(l-/3)3,p) ={l + e){l-(3)H = {l + 6)t, 

- i=l 

SO we use a standard Chernoff bound to show the probability that Algorithm B, 
starting at any vertex, halts at time t for any to ^ ^ ^ is at most 

^B(n3(l-/3)3,p) <t 

t^to L i=i -I t^to \ \ ^ ) / 

< n~^. 

Claim 2: There is some vertex v so that starting Algorithm on u yields a 
component of size at least to whp. 

For this, we start Algorithm B on some vertex v, and if it fails to discover to 
vertices, we start it on an unexplored vertex, v' , and keep going. Each run, we 
expose at most to vertices, so if we fail to times, the number of edges exposed at 
each step dominates B(n^(l — (3)^,p). Now, for Algorithm B to fail in every run, 
we must have that the total number of vertices exposed is less than the number 
of steps. But 

p[^B(n3(l - (3f,p) < \ < e-^"*oCC+s) ^ 

- i=l 

Therefore, we have some vertex where Algorithm B runs for at least to steps 
with probability 1 — o(n“^). 

Claims 1 and 2 imply Algorithm B finds a component of size ti = f3n in In^p 

whp. 

To translate this result from /„ p to In,rm note that the probability /„_p has 
exactly \p = ^(1 + e)n =: m edge pairs is 

where Ad denotes the event that I has m distinct edge pairs. Also note that the 
probability In,m consists of m distinct edge pairs is 

And, by symmetry, for any particular instance I* we have 

= I* \M]= P/„,„ [I„.p = I* \M\. 
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So the probability of any event £ in the In,m model is related to the probability 
in the model by 

[^] = Pc.™ I [M] + P/„,^ [£: 1 7W]P,„,^ [M] 

= P/„_ I Af] + 0(n-2) 

= 0(ni/2)P7„., [Af ]Pz„.^ [S\M] + 0{n-^) 

<0{n^/^)¥j^J£] + 0{n-^). 

Since the failure probability was 0{n~^) in the /„^p model, it is 0(n“^/^) for 

^n^m- ^ 

Proof of Theorem 4: We will analyze a wider class of algorithms. Instead 
of requiring the algorithm to choose edges at each step of the process, we will 
generate the first yn pairs, and allow the process to keep any components in the 
union with at least 2 edges, and additionally to keep up to yn of the isolated 
edges. Then we will generously allow the process to keep all edges from an 
additional |(l + e)n — yn pairs. A nonrigorous intuition for our proof is that the 
first yn pairs are “pretty much” isolated edges, and so the graph resulting from 
this process “looks like” the union ofyn + 2(|(l + e) — y)n= ^(1 + e — 2y)n 
random edges. For y > such a graph is below the threshold for the giant 
component. 

This heuristic argument does not translate directly into a rigorous proof 
because the union of the edges in the yn pairs is not a collection of isolated 
edges. To work around this, we will bound the contribution of the components 
of 3 or more vertices in the union of the first yn pairs. Note that, by symmetry, it 
makes no difference which yn isolated edges the algorithm selects, so in this wider 
class of algorithms, the results of any selection process are the same. To prove 
the theorem, we decompose the graph into two parts. Let G' be the union of yn 
isolated edges and the components containing at least 3 vertices in the union of 
2yn random edges. Let G" be the union of 2(f (1 + e) — y)n = |(1 + e — 4y)n 
edges. We show that whp G' U G" contains no component of size exceeding 

=J-i6(l-y)-2(logn)3/2^ 

To simplify calculations, we make G” a realization of G„,p, with p = (1 + 
e — A'j)jn. We will translate our results to Gn,m at the end of the proof. Let 
e, y, (5 > 0 so that 

(l + e-4y)(^e ^ + 4ye '^ + 2y + j = 1 - 2<5. 

Note that such e,j,S exist, for example taking e = 0.003, y = 0.003 and 6 « 
0.003177. 

Let Tfe denote the number of components in G' with k vertices. Given the 
values of the T^’s, we use an exposure procedure similar to Algorithm A to prove 
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G' U G" has no large components. We expose all the vertices adjacent to Vi in 
G" and if we discover a vertex of a connected component of G', we add every 
vertex of this component to the set Thus, at each step, and conditioned 

on any history, the size of Pi is stochastically dominated by 

n 

'^kB{kTk,p), 

k^l 



and the probability that we discover a component of size exceeding ti is bounded 

by 



EE \p^\ < h 



i—l k —1 



< 



ti n 

EE kB{kTk,p) > ti . 

i—l k —1 



Let £i denote the event that G' contains no component with more than 
K = 6(1 — 7 )“^logn vertices. Standard arguments show P[£li] < 0(n“^) (see, 
for example, [JLROO]). If £i holds, we need only consider the sum of weighted 
Binomial r.v.’s up to the i^-th term. In other words, £i implies 



K 



'^kB{kTk,p) = y^kB{kTk,p). 









Let E = ^^'^kP- Note that E is the expectation of the sum above condi- 

tioned on the T^’s. Also note that the value of Z is dependent on G' only. We 
now obtain a bound on Z that holds whp. 

We use a tree census results for sparse random graphs. It is known that in 
Gn,m=cn /2 we have the following: (see, for example, Pittel, [Pit90]) 

E[Tfc] ~ n(fc'=-2c'=-ie-^Vfc!), 

which, in G' applies to Tk with k > 3. We also have T 2 < yn, and E[Ti] = 
+ 47e“®'*'n — 2 T 2 . So we have 



K 



E[Z] = E 



^ k^TkP 






< (1 + e - 47 ) + 476-®^ + 2^ + ^ /k^^ 

^ ^ k=3 ' 

< (1 + e — 47) + 476“®^ + 27 + (47)“^ (476^“^^)^ 'j 

fc=3 ' 

= (1 + e - 47) (^e- + 47c- ^ + 27 + ^ _ 4 ^gi -47 



= 1 - 2A 



Let 82 be the event that Z < 1 — 5. We use a form of the Azuma-Hoeffding 
inequality due to McDiarmid (see [Hoe63,McD89]) to show £2 holds whp. Note 
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that changing one edge of G' can create or destroy at most two components in 
G' . So this can change the value of Z by at most 2K^p. Therefore, 



¥[82] =¥[Z>l- 6 ] 

< ¥[Z > E[Z] + 5] 
252 



< exp — 



{2jn){2K^p)^ 



< exp — 



6'^n \ 

) ■ 



Conditioning on £i and £2, we have 

ti n t\ K. 

EE kB{kTk,p) = EE kB{kTk,p), (1) 

i—1 k—1 i—1 k—1 



and 



r ti K 



E 



EE kB{kTk,p) 



i—1 k—1 



zti < (1 - S)ti. 



To bound the probability that sum (1) is larger than ti, we use the following 
Chernoff bound, from [ASOO, Theorem A. 1.18]. 



Theorem 6. Let Xi, 1 < i < n be independent random variables with each 
E[Aj] = 0 and no two values of any Xi ever more than one apart. Set S = 
Xi + ■ ■ ■ + X„. Then P[S' > a] < exp(— 2a^). 

Applying this to (1), we have 



ti n 



EE fcB(/cTfc,p) > t\ 



i=i k=i 



£1,82 



< 



< 



ti K 



EE kB{kTk,p) > Zti + Sti 



i=i k=i 
r ti K kTk 






i—1 k—1 j—1 



K 



< exp ^—2 {Sti/Ky 



= n 



So the probability there exists any vertex on which we run for at least ti steps 
is at most n~^ by the union bound. 

Since P[fi] + ¥[£2] < o(n“^), we complete the theorem by observing that 
Gn,p has at least \{l + e — A'y)n edges with constant probability, and extra edges 
can only increase the size of the largest component, so our claim also holds in 
Gn.m- ^ 

Proof of Theorem 5: We bound the size of components formed by this 
process by exposing edges starting from a vertex vi and tracking the number of 
vertices unborn and alive. 
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Consider decomposing the graph into G' , the edges selected before time "/n, 
and G", the edges selected after. 

To simplify calculations, we take G' to be a realization of G„^p, with p = 2^jn. 
Also, we generate G" by applying our selection rule to a realization of In,p', with 
p' = 2(1 — e — 2^)n~^ (recall this is an instance where every pair of edges is 
included independently with probability p'). Thus the expected number of edges 
in G' U G" is yn + |(1 — e — 2y)n = |(1 — as it should be. 

Let a, 13, 7, S,e,r],6 > 0 be such that (1 — P){‘2^) = 2y — e/2 + 6, and 

(1 - /3 - (1 + - l3f/2 + (1 - /3 - (1 + 7)e-2T')(l - S)e-^^1 - P) = a 

and 

o2(l - e - 27) = 1 - 27 - e/2 + 6». 

Note that such parameters exist, for example P = p = 10“®, 7 = 0.4, e = 
0.015, yielding a « 0.521, S « 0.007, and 9 « 0.0002. Let to = 8max{,5-2(l - 
P)j,9~'^{l — e — 27)} log n and ti = Pn. We wish to bound the probability 
Algorithm C halts with a component of size to < t < ti. For this, it is sufficient 
to bound the probability 1^*1 < t for alHo Pit <ti. We decompose Pi into 
Pi = P'LiP'', where P' are the progeny contributed by edges in G' and P" are the 
progeny contributed by edges in G" (and not by edges in G'). If y/'_i iPd < t, 
then either Y.Li l^/l ^ (27 - e/2)t or Y!i=i l^"l < (1 - 2y + e/2)t. 

Now, at any step t, conditioned on any history that has not yet discovered 
Pn vertices, we have |P/| stochastically dominates B((l — P)n,p) and 

r i 1 r * 1 

P ^|P'|<(27-e/2)t <P ^B((l-/3)n,p)<(27-e/2)t 

- i=l -I L i^Y 

< g-5^p47(l-/3) 

<n-\ 

Let Ez denote the event that G' contains (1 ± p)e~'^^n isolated vertices. We 
omit a simple calculation using Chebyschev’s inequality to show Ez holds whp. 

Conditioning on Ez, we have that at any step t, conditioned on any history 
that has not yet discovered Pn vertices, there are at least (1 — /3 — (1 + p)e~'^^)n 
vertices that are not isolated in G' and are still in [7j. So there are at least 
((1 — /3— (l + ?7)e“^'’')(l — /3)^/2+(l — /3— (l + ?7)e“^''')(l — (5)e“^^(l — /3))n^ = an^ 
unexposed edge pairs which would cause our selection rule to place a vertex in 
P'. So 

r i 1 r * 1 

P ^|P"| < (l-27-e/2)t <P ^B(an 3 ,p')<(l- 27 -e/ 2 )f 
- i^l -I L 

< g-e7/4a(l-c-27) < ^-4^ 

Thus by the union bound, the probability that some component has size t for 
to < t < ti is 0{n~'^). 
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Now, as in the proof of Theorem 3, we argue that some component has size 
at least to whp. The argument is identical to the earlier theorem, and the size 
of the progeny at each stage is bounded identically to the previous paragraph, 
so we omit further details. 

Finally, we observe that the probability there is no giant component is 
so we can convert to the original model as in Theorem 3. □ 



References 



[ASOO] 

[BFOl] 

[BFW02] 

[BK03] 

[Fei98] 

[F[oe63] 

[JLROO] 

[McD89] 

[Pit90] 

[SSS02] 



Noga Alon and Joel H. Spencer, The probabilistic method, second ed., 
Wiley-Interscience Series in Discrete Mathematics and Optimization, Wiley- 
Interscience [John Wiley & Sons], New York, 2000, With an appendix on 
the life and work of Paul Erdos. MR 2003f:60003 

Tom Bohman and Alan Frieze, Avoiding a giant component, Random Struc- 
tures Algorithms 19 (2001), no. 1, 75-85. MR 2002g:05169 
Tom Bohman, Alan Frieze, and Nicholas C. Wormald, Avoiding a giant 
component II, manuscript, 2002. 

Tom Bohman and David Kravitz, Creating a giant component, manuscript, 
2003. 

Uriel Feige, A threshold of Inn for approximating set cover, J. ACM 45 
(1998), no. 4, 634-652. MR 20001:68049 

Wassily Hoeffding, Probability inequalities for sums of bounded random vari- 
ables, J. Amer. Statist. Assoc. 58 (1963), 13-30. MR 26 #1908 
Svante Janson, Tomasz Luczak, and Andrzej Rucinski, Random graphs, 
Wiley-Interscience Series in Discrete Mathematics and Optimization, Wiley- 
Interscience, New York, 2000. MR 2001k:05180 

Colin McDiarmid, On the method of bounded differences. Surveys in combi- 
natorics, 1989 (Norwich, 1989), London Math. Soc. Lecture Note Ser., vol. 
141, Cambridge Univ. Press, Cambridge, 1989, pp. 148-188. MR 91e:05077 
Boris Pittel, On tree eensus and the giant component in sparse random 
graphs. Random Structures Algorithms 1 (1990), no. 3, 311-342. MR 
921:05087 

Mark Scharbrodt, Thomas Schickinger, and Angelika Steger, A new average 
case analysis for completion time seheduling, Proceedings of the 34th Annual 
ACM Symposium on Theory of Computing (STOC), 2002, pp. 170-178. 




Sampling Grid Colorings with Fewer Colors 



Dimitris Achlioptas^, Mike Molloy^*, Cristopher Moore^, and 
Frank Van Bussel^ 

^ Microsoft Research optas@microsoft.com 
^ Dept of Computer Science, University of Toronto, and Microsoft Research * 
molloy@cs . toronto . edu 

® Computer Science Department, University of New Mexico ^ moore@santafe.edu 
^ Dept of Computer Science, University of Toronto fvb@cs.toronto.edu 



Abstract. We provide an optimally mixing Markov chain for 6-colorings 
of the square grid. Furthermore, this implies that the uniform distribu- 
tion on the set of such colorings has strong spatial mixing. Four and five 
are now the only remaining values of k for which it is not known whether 
there exists a rapidly mixing Markov chain for fc-colorings of the square 
grid. 



1 Introduction 

Sampling and counting graph colorings is a fundamental problem in computer 
science and discrete mathematics. Much focus has gone towards attacking this 
problem using rapidly mixing Markov chains; see for example [3,12,14,18,22]. 

Sampling graph colorings is also of fundamental interest in statistical physics. 
Graph colorings correspond to the zero-temperature case of the antiferromag- 
netic Potts model, a model of magnetism on which physicists have performed 
extensive numerical experiments (see for instance [10,16,15]). Physicists wish to 
estimate physical quantities such as spatial correlations and magnetization, and 
to do this they attempt to sample random states using Markov chains. 

Moreover, optimal temporal mixing, i.e. a mixing time of O(nlogn), is deeply 
related to the physical properties of the system [9] . In particular, it implies spatial 
mixing, i.e. the exponential decay of correlations, and thus the existence of a finite 
correlation length and the uniqueness of the Gibbs measure. Therefore, optimal 
mixing of natural Markov chains for g-colorings of the grid is considered a major 
open problem in physics (see e.g. [21]). Physicists have conjectured [10,21] that 
the g-state Potts model has spatial mixing for g > 4. This has been established 
rigorously for q > 7 by Bubley, Dyer and Greenhill [3] who showed that all 
4-regular triangle-free graphs, such as the square grid, have optimal mixing. 
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Our main result is that the square grid has optimal mixing for q = 6. We 
prove this by considering the following Markov chain, often called the block heat- 
bath dynamics, which we call At each step, we choose an ixj subgrid S of 

G, uniformly at random from amongst all such subgrids (i.e. its upper-left vertex 
is chosen uniformly from the vertices of G). Let G be the set of q-colorings of S 
which are consistent with the coloring of G \ S'. We choose a uniformly random 
coloring c G C and recolor S with c. Our main theorem is: 

Theorem 1. M(2, 3) on 6-colorings of the square grid mixes in 0(n log n) time. 

We prove Theorem 1 for a variety of boundary conditions: on the torus, on 
finite rectangular regions with fixed colorings on their boundary, and on finite 
rectangular regions with free boundary conditions. Our method is similar to 
that of [3] in that it consists of a computer-assisted proof of the existence of a 
path coupling. At the same time, we exploit the specific geometry of the square 
grid to consider a greater variety of neighborhoods. Moreover, the calculations 
necessary to find a good coupling in our setting are far more complicated than 
those in [3] and require several new ideas to become computationally tractable. 

Using the comparison method of Diaconis and Saloff-Coste [5,17], Theorem 1 
implies that the Glauber and Kempe chain Markov chains also mix in polytime: 

Theorem 2. The Glauber dynamics and the Kempe chain dynamics on 6- 
colorings of the square grid mix in O(n^logn) time. 

Like Theorem 1, this result holds both on the torus and on finite rectangular 
regions with fixed or free boundary conditions. 

Consider now a finite region V and two colorings G, G' of its boundary that 
differ at a single site v, and a subregion U QV such that the distance from v to 
the nearest point u G U is £. Let p. and p' denote the probability distributions 
on colorings of U , given the uniform distribution on colorings of V conditioned 
on G and G' respectively. We say that g-colorings have optimal spatial mixing if 
there are constants a,P > 0 such that ||^ — ^'|| < f3\U\ exp(— a£). In other words, 
conditioning on particular colors appearing on vertices far away from v has an 
exponentially small effect on the conditional distribution of the color of u in a 
uniformly random coloring of the grid. Physically, this means that correlations 
decay exponentially as a function of distance, and that the system has a unique 
Gibbs measure and no phase transition. 

The following recent result of Dyer, Sinclair, Vigoda and Weitz [9] (see also 
the lecture notes by Martinelli [14]) relates optimal temporal mixing with spatial 
mixing: if the boundary constraints are permissive, i.e. a finite region can always 
be colored no matter how we color its boundary and the heat-bath dynamics on 
some finite block mixes in 0(n log n) time, then the system has strong spatial 
mixing. As they point out, g-colorings are permissive for any q > Z\ -|- 1. Thus, 
the fact that M{2, 3) has optimal temporal mixing implies a strong result about 
spatial correlations. 

Corollary 3. The uniform measure on the set of q-colorings of the square grid, 
or equivalently the zero -temperature antiferromagnetic q-state Potts model on 
the square lattice, has strong spatial mixing for q > 6. 
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As mentioned above, physicists conjecture spatial mixing for g > 4. In the last 
section we discuss to what extent our techniques might be extended to g = 4, 5. 



1.1 Markov Chains, Mixing Times, and Earlier Work 

Given a Markov chain M, let tt be its stationary distribution and P* be the 
probability distribution after t steps starting with an initial point x. Then, for 
a given e > 0, the e-mixing time of M is 

Te = maxmin|t: — Trll < e| 

X ^ II II J 

where \\P^ — 7r|| denotes the total variation distance 

V 

In this paper we will often adopt the common practice of suppressing the de- 
pendence on e, which is typically logarithmic, and speak just of the mixing time 
T for fixed small e. Thus the mixing time becomes a function of n, the number 
of vertices, alone. We say that a Markov chain has rapid mixing if r = poly(n), 
and optimal (temporal) mixing if r = O(nlogn). 

The most common Markov chain for this system is Glauber dynamics. There 
are several variants of this in the literature, but for colorings we fix the following 
definition. At each step, choose a random vertex v G G. Let S be the set of 
colors, and let T be the set of colors taken by u’s neighbors. Then choose a color 
c uniformly at random from S\T, i.e. from among the colors consistent with the 
coloring of G — {u}, and recolor v with c. Independently, Jerrum [12] and Salas 
and Sokal [18] proved that for g-colorings on a graph of maximum degree A the 
Glauber dynamics has optimal mixing for q > 2 A, and Bubley and Dyer [2] 
showed that it mixes in O(n^) time when q = 2Z\. Note that M(l, 1) is simply 
the Glauber dynamics. 

Dyer and Greenhill [7] considered a “heat bath” Markov chain which updates 
both ends of a random edge simultaneously, and showed that it has optimal 
mixing for q > 2 A. By widening the updated region to include a site and all of 
its neighbors, Bubley, Dyer and Greenhill [3] showed optimal mixing for q >7 
for 4-regular triangle-free graphs, such as the square grid. 

Another Markov chain commonly used by physicists is the Kempe chain 
algorithm, which they call the zero-temperature case of the Wang-Swendsen- 
Kotecky algorithm [23,24]. It works as follows: we choose a random vertex v 
and a color b which differs from u’s current color a. We construct the largest 
connected subgraph containing v which is colored with a and b, and recolor this 
subgraph by switching a and b. In a major breakthrough, Vigoda [22] showed 
that a similar Markov chain has optimal mixing for q > (11/6)Z\, and this 
implied that the Glauber dynamics and the Kempe chain algorithm both have 
rapid mixing for q > (11/6)Z\. However, for the square grid this gives only q > 8. 
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For g = 3 on the square grid, Luby, Randall and Sinclair [13] showed that a 
Markov chain including “tower moves” has rapid mixing for any finite simply- 
connected region with fixed boundary conditions, and Randall and Tetali [17] 
showed that this implies rapid mixing for the Glauber dynamics as well. Re- 
cently Goldberg, Martin and Paterson [11] proved rapid mixing for the Glauber 
dynamics on rectangular regions with free boundary conditions, i.e., with no 
fixed coloring of the vertices on their boundary. However, the technique of [13, 
11] relies on a bijection between 3-colorings and random surfaces through a 
“height representation” which does not hold for other values of q. 

2 Coupling 

We consider two parallel runs of our Markov chain, M(2, 3), with initial colorings 
Xq,Yq. We will couple the steps of these chains in such a way that (i) each 
chain runs according to the correct distribution on its choices and (ii) with high 
probability, Xt = Yt for some t = O(nlogn). A now standard fact in this area 
is that this implies that the chain mixes in time O(nlogn), i.e. this implies 
Theorem 1; this fact was first proved by Aldous [1] (see also [8]). 

Bubley and Dyer [2] introduced the very useful technique of Path Goupling, 
via which it suffices to do the following: Gonsider any two not necessarily proper 
colorings A, Y which differ on exactly one vertex, and carry out a single step of 
the chain on X and on Y, producing two new colorings X' ,Y' . We will prove 
that we can couple these two steps such that (i) each step is selected according 
to the correct distribution, and (ii) the expected number of vertices on which 
X',Y' differ is at most 1 — ejn for some constant e > 0. See, e.g., [8] for the 
formal (by now standard) details as to why this suffices to prove Theorem 1. 

We perform the required coupling as follows. We pick a uniformly random 
2x3 subgrid S', and let Cx and Cy denote the set of permissible recolorings of 
S according to A, Y respectively. For each c G Cx , we define a carefully chosen 
probability distribution pc on the colorings of Cy ■ We pick a uniformly random 
member ci € Cx and in A we recolor S with ci to produce A'. We then pick a 
random member C 2 G Cy according to the distribution p^ and in Y we recolor S 
with C 2 to produce Y' . Trivially, the pair S, c\ is chosen according to the correct 
distribution. In order to ensure that the same is true of S, C 2 , we must have the 
following property for the set of distributions {pc '■ c G Cx}' 

for each C 2 G Cy, \^\ ^'^(^ 2 ) = 1^71- 

cGCx 

Suppose that v is the vertex on which X,Y differ. If r; G S then Cx = Cy, 
so we can simply define C 2 = ci (i.e. Pc(c) = 1 for each c) and this ensures that 
A' = A'. If S' does not contain v or any neighbor of v, then again Cx = Cy 
and by defining C 2 = ci we ensure that X',Y' differ only on v. If v is not in S 
but is adjacent to a vertex in S, then Cx ^ Cy so our coupling becomes very 
complicated and it is quite possible that C 2 yf c\ and so A', Y' will differ on one 
or more vertices of S as well as on v. 
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For any pair Cx,Cy, we let H{Cx,Cy) denote the expected number of 
vertices in S on which Ci,C 2 differ. For every possible pair Cx,Cy we obtain a 
coupling satisfying (1) and: 



H{Cx,Cy) <0.52 . (2) 

Proof of Theorem 1. As described above, it suffices to prove that for any 
choice of A, Y differing only at v, the expected number of vertices on which 
X', Y' differ is less than 1 — e/n for some e > 0. The probability that S contains 
V is \S\/n = 6/n; for any such choice of S, X' = Y' . The probability that S 
contains a neighbor of v but does not contain v is easily seen to be 10 /n; for any 
such choice of S, the expected number of vertices on which A', Y' differ is less 
than 1.52. Therefore, the overall expected number of vertices on which A', Y' 
differ is less than 

1 X (n— 16)/n + 0 x 6/n + 1.52 xl0/n=l — 0.8/n . 



□ 

Of course, we still need to prove that the desired couplings exist for each 
possible Cx, Cy. These couplings were found with the aid of computer programs. 
In principle, for any pair Cx,Cy, searching for the coupling that minimizes 
H{Cx,Cy) subject to (1) is simply a matter of solving a linear program and 
so can be done in polytime. However, the number of variables is \Cx\ x \Cy\ 
which can be as high as roughly (5®)^. Furthermore, the number of possible 
pairs X,Y is roughly 6^*^, and even after eliminating pairs which are redundant 
by symmetry, it is enormous. Thus, finding these couplings is computationally 
intensive. To help we designed a fast heuristic which, rather than finding the 
best coupling for a particular pair, just found a very good coupling; i.e. one that 
satisfied (2). The code used can be found at www.cs.toronto.edu/~fvb. We 
provide a more detailed description in the next section. 



3 The Programs Used 

Method of the computation: Let R denote the rim vertices, that is, those 
vertices which are adjacent to but outside of the subgrid S. We call a (not 
necessarily proper) coloring of the vertices of i? a rim coloring. For each possible 
pair of rim colorings A, Y which differ only at a vertex v G R, we need to find a 
coupling between the extensions Cx and Cy of A, Y to S, so that the couplings 
satisfy (1) and (2). These couplings were found with a small suite of programs 
working in two phases. In the first phase, exhaustive lists of pairs of rim colorings 
(reduced by equivalence with respect to allowable grid colorings) were generated. 
In the second phase, for each pair A, A, we generated Cx,Cy separately; these 
were then coupled, satisfying (1), in a (close to) optimal way to obtain a bound 
on H{Cx,Cy) that satisfies (2). 

Implementation: All programs take the following parameters: number of 
colors, grid dimensions, and an integer denoting the position of the distinguished 
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vertex v with respect to the grid (0 if adjacent to the corner, +i if adjacent to 
the z-th vertex along the top of the grid, and —i if adjacent to the z-th vertex 
along the side of the grid). If one specifies a rim coloring X and the position 
of V, then this determines Y (up to equivalence via permutation of colors). For 
each coloring X we determined a good coupling for each non-equivalent position 
of V. 

By default the programs generate rim colorings and couplings on the assump- 
tion that the subgrid is not on the boundary of the supergrid (i.e. all rim vertices 
potentially constrain the allowable subgrid colorings). Boundary cases, however, 
can easily be simulated by using values in the rim colorings that are outside the 
range determined by the number-of-colors parameter. Grids on the boundary 
were only checked when the analysis of the non-boundary grids yielded promis- 
ing values; in all cases we found that the maximum cost for boundary subgrids 
was lower than for the associated non-boundary subgrids. 

Generating rim colorings: Since the calculations required for phase 2 were 
much more time-consuming than those for phase 1, the rim coloring generation 
procedure was designed to minimize the number of colorings output rather than 
the time used generating them. A rim coloring is represented by a vector of 
the colors used on the rim, starting from the distinguished vertex v and going 
clockwise around the subgrid. Since we can assume by symmetry that the color 
used for z; is 0 in A and 1 in A, 0|1 is always the first element (0 is used in the 
actual output, 1 is understood). The following reductions were applied to avoid 
equivalent rim colorings: reduction by color isomorphism (colors 2 and above), 
by exchange of colors 0 and 1, by exchange of colors of vertices adjacent to the 
corners of the subgrid, and by application of flip symmetries where applicable. 

Finding a coupling for particular rim colorings: Two programs were 
used for each rim coloring X, and position i of v. In each, the initial operation 
is the generation of all compatible grid colorings; this is done separately for 
col(u) = 0 and col(z;) = 1 (i.e. for Cx,Cy)- The first program creates a set of 
linear programming constraints that is readable by the program Ip-solve (by 
Michel Berkelaar of the Eindhoven University of Technology; it is available with 
some Linux distributions). As mentioned above, time and space requirements 
made use of this procedure feasible only for checking individual rim colorings, 
and even then the subgrid size had to be fairly modest. The second program 
calculates an upper bound on the optimal cost using a greedy algorithm to 
create a candidate coupling. Given sets of colorings Cx and Cy (of size nix and 
my respectively), the algorithm starts by assigning “unused” probabilities 
and to the individual colorings. Then, for each distance d = 0,1,. ..,n, for 
each coloring ci in Cx it traverses Cy looking for a coloring C 2 which differs from 
Cl on exactly d vertices. When such an C 2 is found it removes the coloring with 
the lower unused probability p from its list and reduces the unused probability 
p' of the other to p' — p; the distance d- pis added to the total distance so far. 
The order in which the lists of colorings Cx and Cy is traversed does affect the 
solution, so an optional argument is available that allows the user to select one 
of several alternatives. 
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This heuristic does not guarantee an optimal solution, and with some grids 
and particular rim colorings the coupling it generates is far from the best. How- 
ever, for the rim colorings we are most interested in (ones where H{Cx,Cy) is 
high for all couplings) it seems to consistently give results that are optimal or 
very close (within 2%). We cannot give a rigorous bound on the running time, 
but a cursory analysis and empirical evidence suggest that it runs in roughly 
O(mlogm) time, where m is the number of compatible grid colorings. Because 
the heuristic is so much faster than the LP solver, our general procedure was as 
follows: (1) Use the heuristic with the default traversal order to calculate bounds 
on the expected distance for all the rim colorings generated in phase one. (2) 
When feasible, use the LP solver on those rim colorings that had the highest 
value of H(Cx,Cy), to obtain an exact value for the maximum. (3) For larger 
grids / more colors than could be comfortably handled by the LP solver, use all 
available traversal orders on those rim colorings that had the maximum value of 
H{Cx,Cy) to obtain as tight a bound as possible within a feasible time. 

Results of the computations: Computations were run on various grid 
sizes and numbers of colors in order to check the correctness of the programs, 
and as well collect data which could be used to estimate running times and 
maximum expected distance for larger subgrid dimensions. For 7 and 8 colorings 
our results corresponded well with previous work on the problem (eg [2]). 

For 6 colorings, we checked 1 x fc subgrids for k < 5, as well as 2 x 2 and 
2x3 subgrids; for all but the last of these the maximum expected distance we 
obtained was too large to give us rapid mixing. The 2x3 subgrid has 2 non- 
equivalent positions with respect to the rim, the corner (position 0, 8 rim vertices 
are adjacent to this position) and the side (position 1, 2 rim vertices adjacent). 
For each X, Y with v in position 0, we obtained a coupling satisfying: 

H{Cx,Cy) < 0.5118309760. 

For each X, Y with v in position 1, we obtained a coupling satisfying: 

H{Cx,Cy) < 0.4837863092. 

Thus, in each case we satisfy (2), as required. 

A slightly stronger output: By examining the problem a bit more closely, 
we see that condition (2) is sufficient, but not necessary, for our purposes. Let 
Hi denote the maximum of H(Cx,Cy) over the couplings found for all pairs 
X, Y where v is in position i, and let multi denote the number of rim vertices 
adjacent to position i. Then, being more careful about the calculation used in 
the proof of Theorem 1, and extending it to a general a x 6 subgrid, we see that 
the overall expected number of vertices on which A', Y' differ is at most 

1 X (n — 2(a-|-6))/n-|-0x ab/n + multi x Hi/n, 

i 

a smaller value than that used in the proof of Theorem 1, where we (implicitly) 
used (maxi Hi) x '^^multi rather than '^^mulU x Hi. Our programs actually 
compute this smaller value. Even so, we could not obtain suitable couplings for 
any grid size smaller than 2x3. 




Sampling Grid Colorings with Fewer Colors 



87 



4 Rapid Mixing: Glauber and Kempe Chain Dynamics 

In this section we prove Theorem 2, showing rapid mixing for the Glauber and 
Kempe chain dynamics, by following the techniques and presentation of Randall 
and Tetali [17], 

Suppose (5 is a Markov chain whose mixing time we would like to bound, 
and Q is another Markov chain for which we already have a bound. Let E and E 
and denote the edges of these Markov chains, i.e. the pairs {x, y) such that the 
transition probabilities Q{x,y) and Q{x,y), respectively, are positive. Now, for 
each edge of Q, i.e. each {x, y) G E, choose a fixed path using the edges of Q: 
that is, choose a series of states x = xi,X2, . . . ,Xk = y such that {xi, Xi+i) G E 
for 1 < t < fc. Denote the length of such a path |7a;,y| = k. Furthermore, for each 
{z,w) G E, let r{z,w) C E denote the set of pairs (x,y) such that jx,y uses the 
edge (z,w). Finally, let 



A = max 

{z,w)eE 



1 

Q{z,w) 



\lx,y\Q{x,y) 

(x,y)&r{z,w) 



Note that A depends on our choice of paths. 

By combining bounds on the mixing time in terms of the spectral gap [6,20, 
19] with an upper bound on Q’s spectral gap in terms of Q’s due to Diaconis 
and Saloff-Coste [5], we obtain the following upper bound on Q’s mixing time: 



Theorem 4. Let Q and Q be reversible Markov chains on q-colorings of a graph 
ofn vertices whose unique stationary distribution is the uniform distribution. Let 
Ai be the largest eigenvalue ofQ’s transition matrix smaller than 1, let and 
denote the e-mixing time of Q and Q respectively, and define A as above. Then 
for any e < 1/4, 



Te < 



4 log q 
Ai 



Anf^ . 



We omit the proof. The reason for the additional factor of n is the fact that 
the upper and lower bounds on mixing time in terms of the spectral gap are 
log7r(a;)“^ apart, where tt is the uniform distribution. Since there are at most 
g” colorings, we have log7r(x)“^ < nlogg. On the grid, it is easy to see that 
there are an exponentially large number of g-colorings for g > 3, so removing 
this factor of n would require a different comparison technique. 

Now suppose that Q is the block dynamics and Q is the Glauber or Kempe 
chain dynamics. We wish to prove Theorem 2 by showing that = 0{nff). By 
adding self- loops with probability greater than 1/2 to the block dynamics, we 
can ensure that the eigenvalues of Q are positive with only a constant increase 
in the mixing time. Therefore, it suffices to find a choice of paths for which A 
is constant. Since, for all three of these Markov chains, each move occurs with 
probability 0{l/n), if \^x,y\ and \E{z,w)\ are constant then so is A. 

In fact, for q > Z\ -I- 2, we can carry out a block move on any finite neighbor- 
hood with Glauber moves. We need to flip each site u in the block to its new 
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color; however, u’s flip is blocked by a neighbor v if v's current color equals u’s 
new color. Therefore, we first prepare for u’s flip by changing u to a color which 
differs from u’s new color as well as that of u’s A neighbors. If the neighborhood 
has m sites, this gives \"ix,y\ < m{A + 1), or |7a;,y| < 30 for M(2,3). (With a 
little work we can reduce this to 13.) 

For the Kempe chain dynamics, recall that each move of the chain chooses 
a vertex u and a color b other than u’s current color. If b is the color which 
the Glauber dynamics would assign to u, then none of u’s neighbors are colored 
with b, and the Kempe chain move is identical to the Glauber move. Since this 
happens with probability 1/q, the above argument applies to Kempe chain moves 
as well, and again we have |7a;,y| < m{A + l). Moreover, we only need to consider 
moves that use Kempe chains of size 1. 

Finally, since each site appears in only m = 6 blocks, the number of block 
moves that use a given Glauber move or a given Kempe chain move of size 1 is 
bounded above by m times the number of pairs of colorings of the block. Thus 
|T(z,u;)| < and we are done. 

An interesting open question is whether we can prove optimal temporal mix- 
ing for the Glauber or Kempe chain dynamics. One possibility is to use log- 
Sobolev inequalities as in [4]. We leave this as a direction for further work. 

5 Conclusion: Can We Reach Smaller q? 

We have run our programs on 2 x 4 and 3x3 subgrids to see if we could achieve 
rapid mixing on 5 colors, but in both cases the largest values of H{Cx,Cy) 
were too high. It does seem likely that rapid mixing on 5 colors is possible by 
recoloring a 3 x 4 subgrid, based on the decrease of the ratio of ma,x E[dist] ■ |i?| 
to I A I as the dimensions increase; similar reasoning leads us to believe that rapid 
mixing using a, 2 x k subgrid is possible, but we would probably need a 2 x 10 
grid or larger to achieve success. Unfortunately, doing the calculations for 3 x 4 
is a daunting proposition. The problem is exponential in two directions at once 
(number of rim colorings, and number of grid colorings for each rim coloring), 
so we get huge increases in running time when we move up a level. 

Acknowledgments. We are grateful to Leslie Goldberg, Dana Randall and Eric 
Vigoda for helpful discussions, and to an anonymous referee for pointing out a 
technical error. Recently, L. Goldberg, R. Martin and M. Paterson have reported 
obtaining the main result of this paper independently. 
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Abstract. We identify two properties that for P-selective sets are effectively com- 
putable. Namely we show that, for any P-selective set, finding a string that is in a 
given length’s top Toda equivalence class (very informally put, a string from 27" 
that the set’s P-selector function declares to be most likely to belong to the set) 
is FP 2 computable, and we show that each P-selective set contains a weakly- 

y^P 

P 2 -rankable subset. 

1 Introduction 

P-selectivity is a generalization of P. A set A is in P if there is a polynomial-time algorithm 
which, given any string x, determines whether x belongs to A. In contrast, a set A is 
P-selective if there is a polynomial-time algorithm (called a P-selector function) that, 
given any two strings x and y, outputs one of those strings, and such that the algorithm 
has the property that if at least one of a; or y is in A, then the one the algorithm outputs 
belongs to A. Informally, it always places a bet on one of them being in the set, and it 
wins whenever winning such a bet is possible. 

The book [18] provides a recent overview of the state of research regarding P- 
selectivity theory (see also the somewhat older article [2]). Nickelsen’s thesis [24] and 
the recent survey article by Nickelsen and Tantau [25] are also very good starting points 
regarding the study of partial information classes such as the P-selective sets. 

A key notion used in P-selectivity theory is the notion of a Toda equivalence class 
which, very loosely put, is a strongly connected component of the graph induced by 
a given P-selector function on the strings of a given length. This paper studies the 
complexity of hnding a string from a given P-selector function’s top Toda equivalence 
class — that is, a string from the unique strongly connected component that can be reached 
from no other strongly connected component. 

P-selectivity theory has many features making it an interesting complexity-related 
research area. P-selectivity represents a natural generalization of feasible decidability, it 
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9980943, and EIA-0205061, and NIH grant P30-AG18254. 
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has a well-studied analog in computability theory, namely, the semi-recursive sets [21]; 
the exploration of P-selectivity had a strong, unexpected impact on the study of NP 
functions, solutions and ambiguity, namely, the nondeterministic version of selectivity 
is the central tool used to show that SAT has unique solutions only if the polynomial 
hierarchy collapses [12]; and, relatedly, selectivity has proven central in understanding 
more generally whether one can reduce the number of solutions of NP functions [12,26, 
23,15]. 

Informally, P-selectivity captnres the notion of sets for which there is a polynomial- 
time algorithm / telling which of any two given elements is “logically no less likely to 
be in the set” (see Definition 2.1). Such sets are called P-selective. P-selective sets can 
be arbitrarily complex: For every tally set A, there is a P-selective set that is Turing- 
equivalent to A [27,28]. In particnlar, some P-selective sets are uncomputable. Despite 
this, in the present paper we identify natnral tasks that are computable for all P-selective 
sets. Indeed, these natural tasks are even computable within relatively low levels of the 
polynomial hierarchy. 

The first such task is to produce, for an arbitrary P-selective set A (which, w.l.o.g., has 
at least one commutative P-selector function), at each length n, a string that is “most 
likely” to be in A. Let us explain a bit more what we mean by this. Each commutative 
polynomial-time P-selector function / for A will implicitly specify a strncture of eqniv- 
alence classes of strings (at a given length) that are equally likely (according to /) to be 
in A, and these classes can be ordered with respect to the order that one might informally 
call no-less-likely-to-be-in-A (we will explain how to define snch equivalence classes, 
which themselves depend on only the P-selector function and not on A, in rigorous detail 
in Section 2 after introducing the tournament-graph model that is useful in their dehni- 
tion; in brief, two strings of the same length are said to be equivalent exactly if there are 
chains of applications of the P-selector function leading from each to the other). We will 
call those classes Toda equivalence classes, in light of Toda [33], and the related order 
will be called a Toda order. Informally put, a Toda equivalence class with respect to 
commutative P-selector function /, of length n strings has the property that for every 
set A for which / is a P-selector, either all the strings in ( are in A or none of the strings 
in ( are in A. And (restricting ourselves as we will globally do to just looking at strings 
all of the same length) each Toda equivalence class is a maximal set of strings for which 
this can be said. 

The Toda-class approach’s ordering implications play a central role in a wide range 
of resnlts, ranging for example from the study of whether P-selective sets can be hard 
for standard complexity classes [33] to the study of associative P-selector functions [9, 
10]. In this paper, we seek to better understand the Toda classes’ own complexity. We 
show that finding an element in the top Toda class of a length (very informally and 
intuitively — and not quite correctly — put, a string that among the strings of that length 

y^P 

is “most likely” to be in A) can be done with an FP ^ compntation. 

The second task we study is that of weak-P-rankability. A function / weakly ranks 
a set A if, for any string x that is in A, f{x) returns the rank of x in A; in other words, 
it says how many strings lexicographically less than or equal to x are in A. A set A is 
weakly-P-rankable if it has a fnnction / that is computable in polynomial time and that 
weakly ranks A. The relation between P-selectivity and weak-P-rankability has been 
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Studied extensively by Hemaspaandra, Zaki, and Zimand in a paper [19] that focused 
on polynomial-time semi-rankable sets, i.e., sets that are simultaneously P-selective and 
weakly-P-rankable. It is shown there that there are P-selective sets that are not weakly- 
P-rankable. In partial contrast, in the present paper we show that any infinite P-selective 
set has an infinite subset that is weakly rankable by an FP = function. We also obtain 
a result about the relationship between P-selectivity and weak-P-rankability. All the P- 
selective sets that have been considered in the literature are either standard left cuts or are 
structurally similar to a standard left cut (more precisely <Pj-reducible to a standard left 
cut). We show that if a standard left cut is weakly-P-rankable, then it is in P. Regarding 
this section (Section 3), we particularly commend to the reader’s attention the proof of 
Lemma 3.3 (omitted here; see the full version), which we feel to be a novel technique, and 
which is given prompt application in yielding the theorems that are stated immediately 
after it. 

For space reasons, all proofs and much discussion/explanation is omitted here; please 
see the full version [16]. 

2 Instantiating the Top Toda Class 

Let N = {0, 1,2,.. .} and let N’*' = {1, 2, 3, . . .}. Our alphabet will be 27 = {0, 1}. 
For any set A, ||A|| denotes the cardinality of A. For any string x G 27*, \x\ denotes 
the length of x. For any set A and any nonnegative integer n, denotes {x \ x G 
A A |x| = n}. For any set A and any string x, A-“ denotes {y \ y G A A y <iex x}, 
where <iex denotes < with respect to the standard lexicographical ordering. For the 
definitions of standard complexity classes such as P, NP, 27^, etc., we refer the reader 
to, for example, the handbook [14]. As is standard, FP denotes the class of all (total, 
single-valued) polynomial-time computable functions, 27f = NP'^P, 27f = NP^P , 
E = U^^>oDTIME[2'="j, and NE = Ufe>o NTIME[2'=”]. Given classes C and V, as 
is standard we say that C is V-immune (equivalently, C is immune to V) if there is an 
infinite set A G C such that no infinite subset of A is a member of V. 

Definition 2.1. [27,29] A set A is P-selective if there is a (total, single-valued) 
polynomial-time computable function f such that, for every x and y, it holds that 

1- f{x, y)=xor f{x, y) = y, and 

2- {x,y} C\ [(a; G A and f{x, y) = x) or {y G A and f{x, y) = y)]. 

VFe use P-sel to denote the class of all sets that are P-selective. 

The function / appearing in Definition 2.1 is called a P-selector or a P -selector 
function. A P-selector (function) / is commutative if it has the property that for 
all X and y in 27*, f{x,y) = f{y,x). Each P-selective set A has a commutative 
P-selector function, because we can replace an arbitrary P-selector / for A with 
f'{x, y) = f{mm{x, y),ma,x{x, y)). It is easy to see that /' is a commutative P-selector 
for A. Since all P-selective sets have commutative P-selector functions, it is very common 
in the literature to focus on commutative P-selector functions, and we do so here. 
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A tournament graph G = (Vg, Eq) is a complete oriented graph, i.e., a directed graph 
having the property that, for every two (possibly equal) nodes a and b, ||{(a, 6), (&, a)}n 
^g}|| = 1- a commutative P-selector induces a tournament graph Gf on E*. The 
nodes are the strings in E* and there is an edge {x, y) (also denoted x >f y) exactly 
if f{x, y) = X. When speaking in plain text, we will use synonymously with this the 
terms x beats y, x wins at y, and y loses to x. (Note that each x both wins at itself and 
loses to itself, and our tournament graphs have self-loops at each node.) We say x >f y 
if X >f y and x ^ y. 

Let Gf^n be the subgraph of G/ induced by the nodes in 17". Two nodes x,y G L7" 
are Toda-equivalent, notated x =Toda y, if Gf,n contains a path from x to y and a 
path from y to x. The relation =Toda is an equivalence relation. We will denote the 
equivalence class of a; G 17" by [x]f. For strings x and y of the same length, we order 
their equivalence classes as follows: [a;]/ > [y]f holds exactly if x >f y. For strings x 
and y of the same length, we say that [a;] / > [y]f exactly if [a;] f > [y]f and \x] / ^ [y]f. 

(It is easy to see that these relations are consistent when applied only among strings 
all ofthe same length. Note that we neither define nor ever use [w]/ > [z]/or['u;]y > [z]f 
for the case where |r(;| ^ \z\. Our focus will always be on collections of same-length 
strings. However, if one wanted to compare different-length strings, for the purposes of 
this paper let us say that equivalence classes of different lengths are always, by definition, 
incomparable, and so viewed as being over all of E* our classes form a partial rather than 
a total order. Since our focus is within a length and there our order is never undefined, 
we will for simplicity simply use the term “order.”) 

Of course, [y]f < [x]f means the same as [x]f > [y]f, and [y]f < [x]f means the 
same as [x]f > [y]f. 

The following fact holds. 

Fact 2.2. 

1. If A is a P-selective set having commutative function f as a V -selector then, for all 
X G 17", Ar\[x]f = [x]f or Aft [x]f = 0. 

2. Let A be a V -selective set having commutative function f as a ¥ -selector. If A ft 
[x]f = [x]f then for all y such that |y| = |a;| and [y\f > [a;]/, it holds that 
A ft [y]f = [y]f. (In fact, it even holds that if Aft [x]f = [x]f then for all y 
(regardless of\y\) such that f(x, y) = y it holds that A (T [y]f = [y]f.) 

These equivalence classes (related to strings of length n) form a partition of i7" . 
In particular, there are strings x", , ■ • ■ , G A'" such that the equivalence classes 

[a;"]/, . . . , [x^]f form a partition of 27" and [a:"]/ > [x^jf > ... > [x^]f- The set 
[xi]f is called the top Toda class (at length n with respect to /). 

Definition 2.3. Let f be a commutative ¥ -selector function. A function g is a 
BestAtLength function (for /) if, for each n, p(l") G [a;"]/, i.e., it outputs an ele- 
ment of the top Toda class at length n with respect to f. 

Each string that belongs to the top Toda class (at length n with respect to /) will be 
called a top Toda element (at length n with respect to /). 
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Note that Definition 2.3 does not mention what set / is a P-selector for, since each 
BestAtLength function will work equally well for each of the potentially uncountably 
many sets for which / is a P-selector. 

Theorem 2.4. Every commutative P -selector f has a BestAtLength function computable 
in FP^2^ 

It is natural to ask if there is a more efficient BestAtLength function. The next result 
shows that it is unlikely that the BestAtLength function of every commutative P-selector 
is in FP because this would imply E = NE. The proof is relativizable, so it follows that 
there is an oracle relative to which the BestAtLength function for some commutative P- 
selector is not in FP. The issue of whether the statement of Theorem 2.4 can be improved 
to FP remains open. 

Theorem 2.5. 7. IfV = NP, then every commutative P-selector f has a BestAtLength 

function in FP. 

2. If every commutative P-selector f has a BestAtLength function in FP then E = NE. 

Since there are relativized worlds in which E and NE differ, Theorem 2.5 (in its rela- 
tivized version, which also holds) yields the following corollary. 

Corollary 2.6. There is an oracle relative to which there is a commutative P-selector 
f such that no BestAtLength function for f is in FP. 

Theorem 2.5 relativizes. That is, for each set B it holds that; If every commutative P^ - 
selector / has a BestAtLength function in FP^ then E^ = NE® . Though neither NP (T 
coNP nor coNE fl NE is currently known to have complete sets (see [31,4,5] regarding 
NP n coNP), one nonetheless can (by a “set by set” argument — [11, Corollary 6] for 
example employs a set-by-set argument in the completely different setting of Karp- 
Lipton-type results), in light of the facts that eNPOcoNP _ ^oNE (T NE and NE = 
coNE n NE 4=^ NE = coNE, as an application of this see that the following holds. 

Corollary 2.7. If every commutative -selector / has a BestAtLength func- 
tion in FpNPncoNP ^ 

The reader may naturally wonder whether the E = NE conclusion of part 2 of The- 
orem 2.5 can be strengthened to a P = NP conclusion. We do not have a definitive 
answer; the fact that BestAtLength functions have tally inputs seems a difficult imped- 
iment to proving this. We do note that it is the only impediment. That is, if we make a 
new notion of BestAtLength function, let us call it BestBelowUsAtLength, that (for a 
fixed commutative P-selector function /) takes as its input (1”, z) and (i) when \z\ ^ n 
outputs “illegal z”; and (ii) when \z\= n and [z]f is not in the bottom Toda equivalence 
class (with respect to / at length n) outputs a string w, Iml = n, such that [w]f is the 
topmost Toda equivalence class (with respect to / at length n) that is less than [z]f (i.e., 
such that [z]f > [rc]/ yet, for each v G 27", it holds that [w]f < [v]/ [u]/ > [z]f). 

(When \z\ = n and [z]f is in the bottom Toda equivalence class (with respect to / at 
length n), a BestBelowUsAtLength function can output whatever lie it likes.) Under this 
definition, it is easy to see that a P = NP conclusion holds. 

Proposition 2.8. If every commutative P-selector f has a BestBelowUsAtLength func- 
tion in FP then P = NP. 
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An element x in the top Toda class at length n has the property that for every string 
y G H'” there is a path from a; to y in the induced tournament graph. We next investigate a 
related notion. It is known (this was noted as early as the 1950s [22]) that in a tournament 
graph there is at least one node v such that for every other node w there is a path of length 
at most 2 from v to w. This property has played an Important role in improving from 
quasilinear to linear the amount of nondeterminism used to accept the P-selective sets 
with optimal nonuniform advice ([13],seealso[17]) and in understanding the complexity 
of the reachability problem in tournaments [32]. A node v with the above property is 
called a king. A function g is a Find-a-King function for a commutative P-selector / if 
for all n, on input 1", / outputs a king of the graph G/^„. 

Proposition 2.9. Every commutative V -selector function f has a Find-A-King function 
computable in FP ^ . 

It is interesting to note that building a top Toda element seems to be easier than building 
a king (as indicated by the previous theorems), yet recognizing a top Toda element seems 
(given our current stage of knowledge) to be a more difficult task than recognizing a 
king. This holds because a string x G F'” is a top Toda element at length n exactly if for 
any string y G F'" there is a path from x to y; checking this condition is in PSPACE. 
One can observe that this problem is also in the advice class PP / linear (that is it can 
be done in PP given an advice string of size 0(n)). Indeed, for an arbitrary P-selector 
/, let us consider the Toda-equivalence classes for the strings in 27" sorted according to 
the order relation defined above: [xi]f > [x 2 ]f > . . . > [xk]f- Then any element x in 
[xi]f beats at least the elements in {x} U (U2<i<fc [xi]f) and thus has outdegree at least 
1 + X^ 2 <i<fc I l[^*]/ll - *^hat any element (of 27") that is not in [xi]f cannot beat 

any element in [xi]f and thus its outdegree is at most X) 2 <i<fc ll[^i]/ll- Thus if we use 
as the advice string the binary encoding of 1 + X) 2 <i<fc 1 1 / 1 1 check versus a 

threshold the outdegree of a node via a PP computation, we can check whether a string 
is in the top Toda class. 

On the other hand, a string x in Gnj is a king node if and only if (Vyi : \yi\ = 
|x|)(3y2 : I 2 / 2 I = |x|)[(x beats yi) V ((x beats j/ 2 ) A (j /2 beats j/i))]. Thus checking 
whether a string x is a king in G„ / can be done with a iT| computation. We note that 
this fact, and also Proposition 2.9, could be shown indirectly using Tantau’s recent work 
on the complexity of succinct tournament reachability [32]. 

Though all P-selective sets are known to be in the class NP/ linear [17], which was 
an improvement from even earlier work [8] that implicitly placed them in PP / linear, it 
is an open question whether the PP / linear result for top Toda element recognition given 
above can itself be strengthened to NP / linear. We conjecture that it cannot. Also, we 
note in passing that in the profoundly different model in which the tournament — far from 
being uniformly specified via a P-selector function — can be explored only via queries to 
a black box, and our input additionally includes the set of nodes inducing via that black 
box a tournament over which a king (or a certain sequence of kings) is sought, bounds 
on the necessary and sufficient numbers of queries to find such have been studied in, for 
example, [30]. 
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3 P- Selectivity and Ranking 

While P-selectivity is an extension of polynomial-time decidability, the notion of 
polynomial-time weak rankability goes in the opposite direction. It describes sets that are 
so simple that there exists a polynomial-time algorithm that, on inputs that are elements in 
the set, outputs the number of elements in the set up to that element. Weak-P-rankability 
can be an attribute of extremely complex sets. One further generalization is to allow 
the ranking functions to belong to some broader family of functions (e.g., having some 
complexity bound less stringent than polynomial time). There have been many papers 
studying the issue of which sets can be ranked [3,7,1,20]. 

Definition 3.1. For any set B and any string x, define rankB {x) = | \B-^ 1 1. 

1. [7] A set A is strongly-P-rankable if there is a polynomial-time computable function 
f such that {fix G S*) [f{x) = rankA{x)]. We also use strongly-P-rankable to 
denote the class of all sets that are strongly-P-rankable. 

2. [3] A set A is P-rankable if there is a polynomial-time computable function f such 
that (a) fix G A) [f{x) = rankAfi)] and (b) fix ^ A) [f{x) = “not in A”]. We 
also use P-rankable to denote the class of all sets that are P-rankable. 

3. [7] A set A is weakly-P-rankable if there is a polynomial-time computable function 
f such that fix G A) [f{x) = rankA{x)].We also use 'fitakly-P-rwAiAAe to denote 
the class of all sets that are weakly-P-rankable. 

4. Let T be a family of functions mapping strings into natural numbers. A set A is 
weakly-iF-rankable if there is a function f G T such that fix G A) [f{x) = 
rankA(x)]. We also use weakly-7^-rankable to denote the class of all sets that are 
weakly -T - rankable. 

Note that, immediately from the definitions, strongly-P-rankable C P-rankable C 
weakly-P-rankable. (The first inclusion is easy to see in light of the fact that, for each 
X e, X G A rankAfi) > rankAfi — 1). where a; — 1 denotes the immediate 
lexicographic predecessor of x.) Also note that for x ^ A, the definition of weakly-P- 
rankable sets puts no constraint on the behavior of / on input x other than that / must 
run in polynomial time. This is a point of similarity with P-selectivity useful for the fol- 
lowing rehnement of P-selectivity, which has been introduced in [19]. The rehnement 
adds the requirement that when at least one of the inputs belongs to the set, we output 
not merely an input that is in the set, but also output its correct ranking information. 

Definition 3.2. [19] A set A is polynomial-time semi-rankable if there is a ( total, single- 
valued) function f such that, for every x and y, 

1- (3n) [ffi,y) = {x,n) or ffi,y) = {y,n)], and 

2. {x,y} A 0 [( 2 : G Aandffi,y) = fi, rankAfi))) or {y G 

A and ffi, y) = {y, rankA{y)))]. 

In such a case, we say that f is a semi -ranking function for A. We use P-sr to denote the 
class of sets that are polynomial-time semi-rankable. 
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As noted in [19], P-sr = P-sel fl weakly-P-rankable. Among other results, it is 
shown in [19] that P-sr is a proper set of P-sel, i.e., there are P-selective sets that are not 
weakly-P-rankable (there are also sets that are weakly-P-rankable but not P-selective). 
In the full version of this paper, we provide a new and short argument showing this fact, 
and even that there are P-selective sets that are not weakly-rankable by any function in, 
for example, the arithmetical hierarchy. 

It is shown in [19] that P-sr has structural properties different from those of P-sel. 
For example, unlike P-sel, P-sr is not closed under complementation, union with P sets, 
or join with P sets. It is natural to ponder whether P-sel can be separated from the class 
of weakly-P-rankable sets or from P-sr in a stronger sense, namely with immunity. We 
note first that the above approach is useless because L{0) = 0 and any standard left cut 
L{'y), 0 < 7 < 1, contains the subset {0^ | j G N}, which belongs to P-sr. Perhaps 
somewhat surprisingly it turns out, as we will show as Theorem 3.5, that the statement 
“P-sel is weakly-P-rankable immune” implies P ^ (equivalently, P ^ NP) and thus 
it seems beyond reach at this time. 

A set S if P-printable [ 6 ] if there exists a polynomial-time algorithm such that, for 
each n C N, on input 0" the algorithm outputs exactly the members of S having length at 
most n. Note that every P-printable set is sparse and belongs to P. The above definition 
can be relativized in the standard way. 

Lemma 3.3. Let Q be any set. If A is P-selective, S is P^ -printable, and A n S' is 
infinite, then A fl S (and thus also S, and most particularly also A) has an infinite 
weakly-PP^ -rankable subset. 

The following results immediately follow from Lemma 3.3. 

Theorem 3.4. P-sel is not bi-immune to the class of weakly-P-rankable sets. 



yip 

Theorem 3.5. Any infinite P-selective set A has an infinite weakly-PP -rankable sub- 
set. 

Corollary 3.6. 7/P = NP, then P-sel is not immune to the class of weakly-P -rankable 
sets. 

The reader may wish to compare Theorem 3.5 with the following result — neither 
of which seems to imply the other — from [9] (see also [10]) regarding printability [ 6 ]: 
Each infinite P-selective set B has an infinite FP®®^^ -printable subset. 

An important subclass of P-sel is the class of sets that are standard left cuts, a class 
that we have already used. Recall that for each real 0 < 7 < 1 the standard left cut of 7 is 
thesetL( 7 ) = {/3i/?2 • • • U G N A (Vj : 1 < j < z)[Pj G {0,1}] A J2i<i<z § < 
7 } . All the P-selective sets that have been constructed in the literature are either standard 
left cuts or are <[[j-equivalent to a standard left cut and, in fact, proving that there is 
a P-selective set that is not <[[j-equivalent to a standard left cut is known to be as 
hard as showing P PP [ 8 ]. In contrast, we observe that standard left cuts that are 
weakly-rankable are always in P. 

Theorem 3.7. If A is a standard left cut, then A is weakly-P-rankable if and only if A 
is strongly-P -rankable. 
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Since strongly-P-rankable C P-rankable = P fi weakly-P-rankable, we have the 
following immediate corollary. 

Corollary 3.8. Each weakly-P-rankable standard left cut belongs to P. 
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Abstract. In an earlier paper we gave efficient algorithms for parti- 
tioning chordal graphs into k independent sets and i cliques. This is a 
natural generalization of the problem of recognizing split graphs, and is 
NP-complete for graphs in general, unless k < 2 and £ < 2. (Split graphs 
have k = £ = 1.) 

In this paper we expand our focus and consider general M-partitions, 
also known as trigraph homomorphisms, for the class of chordal graphs. 
For each symmetric matrix M over 0, 1, *, the M-partition problem seeks 
a partition of the input graph into independent sets, cliques, or arbitrary 
sets, with some pairs of sets being required to have no edges, or to have 
all edges joining them, as encoded in the matrix M. Such partitions 
generalize graph colorings and homomorphisms, and arise frequently in 
the theory of graph perfection. We show that many M-partition problems 
that are NP-complete in general become solvable in polynomial time for 
chordal graphs, even in the presence of lists. On the other hand, we show 
that there are M-partition problems that remain NP-complete even for 
chordal graphs. We also discuss forbidden subgraph characterizations for 
the existence of M-partitions. 



1 Introduction 

The M-partition problem was introduced in [8]. Let M he & symmetric m x m 
matrix with entries Mij G {0, 1, *}. An instance of the M-partition problem is a 
graph G. A solution for the instance is a partition of vertices in G into m parts, 
corresponding to the rows (and columns) of the matrix M, such that for distinct 
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vertices x and y of the graph G, placed in parts i and j (possibly with i = j) 
respectively, we have the following: 

— if M{i,j) = 0, then xy is not an edge of G; 

— if M{i,j) = 1, then xy is an edge of G. 

(If M{i,j) = *, then xy may or may not be an edge in G.) 

An instance of the list M -partition problem is a graph G, together with a 
collection of lists L{x),x G V{G), each list being a set of parts. A solution for 
the instance of list M-partition is a solution for the corresponding M-partition, 
such that each vertex x is placed in a part i € L(x). 

List M-partitions generalize list colorings, retractions, and list homomor- 
phisms [7], and are of interest in the theory of perfect graphs [3,4]. Many well- 
known problems seeking, say, clique cutsets, homogeneous sets, skew cutsets, 
joins, etc., can be formulated as list M-partition problems [8]. Moreover, the 
study of list M-partition problems can lead to efficient solutions of some of 
these problems [4]. 

In [8] we have given polynomial time algorithms for many list M-partition 
problems, and quasi-polynomial (0(n*°®")) time algorithms for certain others. 
In [6] we have shown that all list M-partition problems are solvable in quasi- 
polynomial time, or are NP-complete. (We call such a result a quasi- dichotomy.) 
Many of our quasi-polynomial time algorithms from [8] were improved to poly- 
nomial time algorithms in [2,4], but it is not known whether all list M-partition 
problems are polynomial time solvable or NP-complete; this is known as the 
Dichotomy Problem for list M-partitions. 

In this paper, we consider the restrictions of both the M-partition and the 
list M-partition problems to instances G that are chordal graphs. The two cor- 
responding problems will be called the chordal M-partition problem and chordal 
list M-partition problem. 

There are several classical examples to suggest that M-partitions of chordal 
graphs can be found in polynomial time. For instance, fc-colorability of chordal 
graphs (M is the k x k matrix with 0 on the diagonal and * everywhere else) 
can be decided efficiently using a perfect elimination ordering [9]; in fact, the 
algorithm either produces a /c-coloring of the input graph or produces the unique 
forbidden subgraph Kk+i. A similar result is known about clique covering (M 
is the £ X £ matrix with 1 on the diagonal and * elsewhere). In [10] we have 
shown more generally that there is a polynomial time recognition algorithm, 
and a forbidden subgraph characterization, of graphs that can be partitioned 
into k independent sets and £ cliques (M has k zeros and £ ones on the diagonal, 
* everywhere else). 

We further extend these results to the list M-partition problem. We also 
extend the class of matrices M for which we can give polynomial time algorithms, 
and forbidden subgraph characterizations. However, we also find M-partition 
problems that remain NP-complete for chordal graphs, even in the absence of 
lists. Certain dichotomy and quasi-dichotomy results will also be claimed. 
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2 Matrices M with 0, 1 Diagonal 

If the diagonal of the matrix M contains no *, we have several large classes of 
polynomially solvable list M-partition problems, including the list versions of 
the above problem of partitioning G into k independent sets and I cliques. 

Consider first the case where the ky.k matrix M has zero diagonal. 

Theorem 1. If all diagonal entries of M are zero, then the chordal list M- 
partition problem can he solved in polynomial time. 

Proof. A chordal graph G which admits an M-partition with such a matrix M 
cannot have a clique with fc -I- 1 vertices; hence it must have treewidth at most 
k — 1. The existence of a list M-partition of a graph of bounded treewidth can be 
tested by standard dynamic programming techniques [1,5,11]. Recall that a tree 
decomposition of a graph G is a pair (A, U) where C/ is a tree and X = {Xi)i^y(^ir^ 
is a collection of subsets of V{G) whose union equals V{G), such that each edge 
xy of G is included in some Xi, and such that for each vertex x of G, the set of 
all Xi containing x forms a subtree of G. The treewidth of a decomposition is 
the maximum value of jA^j — 1, and the treewidth of a graph is the minimum 
treewidth of a decomposition. 

A tree decomposition in which U has a fixed root r is called nice [1] if each 
node of the rooted tree U has at most two children, and the following conditions 
are satisfied: If i has two children, say j and h {a, join node), then Xi = Xj = Xh', 
if i has one child j then Xi is obtained from Xj by adding (an introduce node) or 
deleting (a forget node) a single vertex of G, and if |Aj| = 1 for each leaf {start 
node) t of [/. It is known that a nice tree decomposition of a chordal graph of 
bounded treewidth can be obtained in linear time [1]. 

Given a nice tree decomposition (A, U) of G with root r, we denote by Gi 
the subgraph of G induced by the union of Xi and all Xj where j is a descendant 
of i. Let F{i) be the set of all pairs {II, S), where II is an assignment of the 
vertices in Xi to parts, obtained by restricting a list M-partition E oi Gi, and 
S is the set of those parts in the partition E which contain vertices of Gi — Xi. 
Note that each F{i) has at most (2fc)* elements. 

We can compute the set F{i) for any node, once all its descendants j have 
had their values F{j) calculated. This is not hard to see, considered separately 
the start, introduce, forget, and join nodes. For instance, suppose i is a forget 
node, with the unique child j, and Xi = Xj — x. For each {II, S) £ F{j) we add 
to F{i) the pair {II',S'), where II' is II restricted to Xi and S' equals either 
S, if the part a that x was assigned in II was already present in S, or equals 
S' U a. On the other hand, if i is an introduce node, with the unique child j and 
Xj = Xi — X, then for each {II, S) £ F{j) we consider all possible values x can 
take with the current assignment U, because of the adjacencies of x in Xj, and 
also because of the non-adjacencies of a: in G^ — A^; it is for this purpose that 
we keep track of the set S. 

The above proof yields an algorithm for the list M-partition problem re- 
stricted to graphs of treewidth at most k — 1 (and hence for all chordal graphs). 
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of complexity 0(n(2fc)^); the complexity analysis is easily adapted from that of 

[5], 



We next consider the case where the iv. i matrix M has all diagonal entries 
one. Let G with lists L{x) be an instance of the chordal list M-partition problem. 
A rectangle in G is a collection of sublists C L{x),x G V{G), such that any 
choice of parts from for each x constitutes a solution. 

Theorem 2. If all diagonal entries of M are one, then the chordal list M- 
partition problem can be solved in time polynomial in . The set of solutions to 
an instance is the union of at most rectangles, and can be found in polynomial 
time. 

Proof. Consider a perfect elimination ordering of the graph. If there are £ parts, 
then choose £ pairs (xi,yi) of vertices in the input graph G, where Xi will be 
the first vertex in the perfect elimination ordering to go to part i, and yi the 
last vertex in the perfect elimination ordering to go to part i. This involves 
possible choices. For each choice, remove part i from the list of any vertex that 
occurs either before Xi or after yi in the elimination ordering. 

Remove from all lists of vertices z forbidden parts j given their adjacency 
or non-adjacency to the vertices Xi,yi that go to part i according to what M 
requires. That is, vertex z cannot go to part j if there is an edge zxi or an edge 
zyi in G and the entry M{i,j) = 0. Similarly, vertex z cannot go to part j if 
there is no edge zXi or no edge zyi in G and the entry M{i,j) = 1. 

Finally, assign parts to vertices from their resulting reduced lists arbi- 
trarily. Suppose Zi,Zj end up in parts i,j respectively and are adjacent, but 
M{i,j) = 0. Say Zi occurs before Zj in the perfect elimination ordering. Then 
Zi is adjacent to yi, since M{i,i) = 1. Thus yi and Zj are both neighbors of Zj, 
and both occur after Zj, so yi is adjacent to Zj by the definition of a perfect 
elimination ordering. Since M{i,j) = 0, part j would have been removed from 
the list of Zj. 

In the other case, suppose Zj, Zj end up in parts i,j respectively and are not 
adjacent, but M{i,j) = 1. Say Zi occurs before Zj. Then Xi is adjacent to Zi since 
M{i,i) = 1. Also Xi is adjacent to Zj since M{i,j) = 1. Thus Xi is adjacent to 
both Zi and zj, and both occur after Xi, so Zj is adjacent to Zj by the definition 
of a perfect elimination ordering, a contradiction. 

Thus we end up with families of solutions, each family given only by 
restrictions on possible parts for each element, so that each family is a rectangle. 

In the rest of the paper we often focus on (k + £) x (k + £) matrices M which 
consist of a, k X k diagonal matrix A and an £ x £ diagonal matrix B, with an 
off-diagonal kx£ matrix G (and its £x k transpose). We shall call such matrices 
A, B, C-block matrices. 

Assume now that all diagonal entries of A are zero, and all diagonal entries 
of B are one. We shall also consider restrictions on G. 

Feder, Hell, Klein, and Motwani [8] showed the following. Let A and B be 
two classes of graphs that are closed under taking induced subgraphs, and for 




104 



T. Feder et al. 



which membership can be tested in polynomial time. Suppose further that there 
exists a constant c such that any graph both in A and B has at most c vertices. 
They consider the question of partitioning the vertices of a graph G into two sets 
Sa and Sb so that the subgraph Ga induced by Sa is in A, and the subgraph 
Gb induced by Sb is in B. They show that there are at most such partitions, 
and that all such partitions can be found in polynomial time. 

In our application, we let A be the class of graphs without a clique with A; + 1 
vertices, and B the class of graphs without an independent set of A + 1 vertices. 
A chordal graph without a clique with fc+ 1 vertices and without an independent 
set with £ + 1 vertices has at most c = k£ vertices, since it is fc-colorable, and 
thus a union of k independent sets. 

Given an instance G, let Sa be the set of vertices that are placed in the parts 
corresponding to the kx k matrix A, and let Sb he the set of vertices that go to 
the parts corresponding to the £ x £ matrix B. It follows that Sa & A,Sb & B. 

Suppose first the k hy £ matrix G is all *. Then for each of the valid 
partitions into two graphs Ga and Gb, 'we can restrict the lists for Ga to parts 
in A and solve the problem for matrix A on Ga with the algorithm of Theorem 
1 . Similarly, we can restrict the lists for Gb to parts in B and solve the problem 
for matrix B on Gb with the algorithm of Theorem 2. 

More generally, we call a matrix G horizontal if all entries of C corresponding 
to a part i in A are the same, and vertical if all entries of G corresponding to a 
part j in B are the same. Finally, we call matrix G crossed if the entries of G 
are all 0 or * (or all 1 or *) and every zero (respectively every one) belongs to 
either a row or a column of all zeros (respectively all ones) . 

Theorem 3. Suppose M is an A, B, G-block matrix. 

If all diagonal entries of A are zero, all diagonal entries of B are one, and if 
G is either horizontal, vertical, or crossed, then the chordal list M -partition can 
he solved in time polynomial in . 

Proof For each choice of Ga and Gb, H all entries of G corresponding to a part 
i in A are zero (respectively one) then it suffices to remove the part i from the 
list of any vertex v of Ga that has a neighbor in Gs (respectively for which 
some vertex of Gg is not a neighbor). Similarly, if all entries of G corresponding 
to a part j in B are zero (respectively one) then it suffices to remove the part j 
from the list of any vertex v of Gb that has a neighbor in Ga (respectively for 
which some vertex of Ga is not a neighbor) . Once the conditions given by G are 
met, we can replace G by an all * matrix and solve the problem for Ga and Gb 
using Theorems 1 and 2. 

If G is vertical, then the complexity can be improved to n^^“*'‘^*^^i(2fc)*). 

We can generalize this result to matrices A which consist of diagonal blocks 
Ai with zero diagonals, and matrices B which consist of diagonal blocks Bi with 
all diagonal entries one, as long as all entries of A not in the diagonal blocks are 
one, all entries of B not in the diagonal blocks are zero, and all block matrices 
Gij of G corresponding to Ai,Bj are either horizontal, vertical, or crossed. 
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3 NP-Complete Problems 

Consider a fixed bipartite graph H. The list H-coloring problem is defined as 
follows: An instance is a bipartite graph G with lists (white vertices of G have lists 
consisting of white vertices of H and similarly for black vertices) , and a solution 
is a mapping of vertices of G to vertices of H so that adjacency is preserved and 
each vertex of G is mapped to a member of its list. (Such a mapping is called a 
list H-coloring of G.) Feder, Hell and Huang [7] showed that the list iJ-coloring 
problem is polynomial time solvable if the bipartite graph H is the complement 
of a circular arc graph (o cocircular graph), and is NP-complete otherwise. Based 
on this result, it will be possible to find NP-complete chordal list M-partition 
problems. 

Given a bipartite graph H with k white vertices (forming the set Va) and I 
black vertices (forming the set Vb), the matrix corresponding to H is the k x £ 
matrix G with G{i,j) = * if ij is an edge in H (with i € Va,J € Vb), and with 
G(i,j) =0 otherwise. 

Theorem 4. Let M he an A,B,G-block matrix. 

Suppose A does not contain any 1 ’s, and B does not contain any 0 ’s. If G is 
the matrix corresponding to a bipartite graph H that is not a cocircular graph, 
then the chordal list M-partition problem is NP-complete. 

Proof. Consider an instance G of the list iJ-coloring problem, and define the 
graph G' to be obtained from G by adding all edges between pairs of black 
vertices. (The lists of G' remain the same as in G.) It is easy to see that G has a 
list iJ-coloring if and only if G' has a list M-partition. Since G' is a split graph 
(it can be partitioned into a clique and an independent set), it is also chordal 
[9]. 



The same result holds if G is obtained from the matrix corresponding to a 
bipartite graph H by replacing each 0 with a 1. (This follows by replacing the 
bipartite graph G' by the bipartite complement G" of G.) 

The proof implies that the list M-partition problems corresponding to graphs 
that are not cocircular are NP-complete even for split graphs. It is easy to see 
that, in the special case when A is an all zero matrix and B is an all one 
matrix, we in fact obtain the following dichotomy (again, valid also for matrices 
G obtained by replacing all zeros by ones): 

Theorem 5. Let M be an A, B,G-block matrix. 

If A = 0, B = 1, and C is the matrix corresponding to a bipartite graph H, 
then the chordal list M-partition problem is polynomial if H is a cocircular graph 
and is NP-complete otherwise. 

A similar quasi-dichotomy result can be derived from the theorem of Feder 
and Hell [6], who showed that on general instances, all list M-partition problems 
are quasi-polynomial or NP-complete. In particular, if A and B are as above 
(all-zero and all-one matrices), and G is an arbitrary matrix (not necessarily 
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corresponding to a graph H), then it can be shown that in fact the chordal list 
M-partition problem is quasi-polynomial or NP-complete. 

Several generalizations of these dichotomy and quasi-dichotomy results can 
be proved: It is enough to assume, for instance, that B (instead of being an all 
one matrix) has ones on the diagonal and no zeros. The quasi-dichotomy also 
applies if B is only assumed to have ones on the diagonal, as long as A has zeros 
on the diagonal and no *’s. In this case, if additionally C has no zeros (or no 
ones), we have dichotomy. These results will be proved elsewhere. 

We now focus on constructing NP-complete M-partition problems (without 
lists). Let H again be a bipartite graph. The H-retraction problem is the restric- 
tion of the list iJ-coloring problem to instances G containing H as a subgraph, 
and with lists either L{g) = g, A 9 & V{H), or L{g) = V{H), otherwise. A list 
iJ-coloring of G is called an H-retraction of G, in this situation. Many bipartite 
graphs H are known to yield NP-complete iJ-retraction problems, although a 
complete classification of complexity is not known, and dichotomy has not been 
proved, for M-retractions. In particular, it is known that if H is an even cycle of 
length greater than four, the iL-retraction problem is NP-complete [7]. 

Theorem 6. For every bipartite graph H such that the H-retraction problem 
is NP-complete, there exists a matrix Mh such that the Mh - partition problem 
(without lists) is also NP-complete. 

Proof. Let H he a bipartite graph such that the M-retraction problem is NP- 
complete. We first extend the graph H to a larger bipartite graph H' , by attach- 
ing to each white vertex oi H a path of length five and to each black vertex of 
H a path of length four. Note that all the leaves (vertices of degree one) of H' 
are black. 

We now introduce an auxiliary problem, which we shall call the weak H' - 
retraction problem. Suppose that the bipartite graph H' has k black vertices, 
forming the set Vb, and let L denote the set of all black leaves of H' . An instance 
of the weak iL-retraction problem is a bipartite graph G with a specified set X of 
k black vertices, such that each vertex of G not in X has at most one neighbour in 
X . A solution to the instance is an edge-preserving and color-preserving mapping 
of the vertices of G to the vertices of H such that X is mapped bijectively to Vb. 
We now show that the iL-retraction problem reduces to the weak iL'-retraction 
problem. 

Suppose G is an instance of the M-retraction problem, i.e., a bipartite graph 
containing H. We transform G to an instance G' (with a set X) of the weak 
iJ'-retraction problem as follows: Let X be another copy of the set Vb, disjoint 
from G. Consider the union of G and X, and identify each vertex of L in A 
with the corresponding vertex of L in G. Finally, add internally disjoint paths of 
length four joining all pairs of vertices of X which correspond to vertices in Vb 
of distance two or four in H' . Call the resulting graph G' . We now argue that G 
admits an iJ-retraction if and only if G' admits a weak iJ'-retraction. 

On the one hand, suppose / is an iL-retraction of G. Then /, extended by 
taking each vertex of A — L to the corresponding vertex of Vb is a weak H'- 
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retraction of G' . For the other direction, we note that any bijection between X 
and Vb has to map vertices of L to vertices of L, since leaves in H' have exactly 
two vertices in H' at distance two or four, while black vertices of H' that are 
not leaves have at least three vertices in H' at distance two or four. Therefore, 
any weak iJ'-retraction of G" which maps the vertices of X bijectively to the 
vertices of Vb must map the copy of H' in G' isomorphically to H' . It follows 
that G admits an iL'-retraction, which can easily be modified to an iL-retraction 
by mapping all the added paths of H' into H. 

Next, we define a matrix Mh such that the chordal M//-partition problem 
(without lists) is NP-complete, as claimed in the theorem. The matrix Mh will 
be an A, B, G-block matrix in which the diagonal matrix A is an all zero matrix; 
the diagonal matrix B has all diagonal entries one and all other entries *; and 
finally, the matrix G will be the matrix corresponding to the bipartite graph H' . 

We now reduce the weak iL'-retraction problem to the M//-partition prob- 
lem. Given an instance G' for the weak iL'-retraction problem, we construct 
an instance G” of the Mh - partition problem as follows. We replace each white 
vertex a of G' by a set I (a) ot k-\-l independent vertices (where k = \Vb\), and 
each black vertex b of G' by a clique K{b) of two vertices. Whenever a and b are 
adjacent in G', all vertices of la are adjacent to all vertices of Kb in G". Finally, 
we add all edges between Kb and Kb' unless both b and b' are in X. Note that 
each vertex every I{a) is adjacent to at most one K{b) with b £ X. 

We claim that G' admits a weak iL'-retraction if and only if G” admits an 
Mn-partition. Indeed, if / is a weak iJ'-retraction of G', all vertices of a set I{a) 
can be placed in the part f{a) and all vertices of a set K{b) can be placed in 
the part f{b). Conversely, each M//-partition of G" must place at least one of 
the two vertices in any K(b) to a part in B, since A is an all-zero matrix. Also, 
if b, b' are both in X, these vertices must be placed in distinct parts of B. By a 
similar argument, at least one vertex of each 1(a) must be placed in a part in 
A, since the vertices placed to parts in B are covered by k cliques. This way we 
deduce an iJ'-retraction of G'. 

It remains to argue that the instance G" is a chordal graph. We first note 
that each vertex of every 1(a) is only adjacent to vertices in K(b) with b ^ X 
expept possibly in one K(b) with b £ X. According to the definition of G", these 
vertices are all mutually adjacent, i.e., a clique. Thus we can repeatedly remove 
simplicial vertices (vertices whose neighbours form a clique) from the sets 1(a), 
until G" is reduced to the union of the K(b), which is clearly chordal. 

4 Conclusions 

We also have some forbidden subgraph charaterizations of M-partitionable 
chordal graphs. It is well-known, for example, that a chordal graph G is k- 
colorable if and only if it does not contain a Kk+i- In [8] we have extended this 
as follows: A chordal graph G can be partitioned into k cliques and £ indepen- 
dent sets if and only if it does not contain an induced subgraph isomorphic to 
(i + l)Kk+i- For many other matrices M it is possible to characterize non-M- 
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partitionable chordal graphs G by a finite number of forbidden subgraphs. At 
the same time, it follows from our results that, unless P=NP, this is not the case 
for all matrices M. 

Once again, we shall consider only A, B, G-block matrices M, with A having 
zero diagonal and B having a diagonal of ones. Moreover, we shall assume that 
all off-diagonal entries of A are the same, say a, all off-diagonal entries of B are 
the same, say 6, and all entries of C are the same, say c. Note that we may assume 
a yf 0 and b =/= 1, otherwise we may replace M by a matrix with fc = 1 or t' = 1 
respectively. The result of [10] states that when a = b= c=*, a chordal graph 
is non-M-partitionable if and only if it contains in induced subgraph isomorphic 
to {£+l)Kk+i. 

We can show that if c yf *, the non-M-partitionable chordal graphs can 
always be characterized by a finite number of forbidden subgraphs, all with at 
most {k + 1){£+ 1) vertices. 

This bound does not always apply: if c = *, we know that in the particular 
case of A: = 1 and & = 0, the largest minimal forbidden subgraph has on the 
order of vertices. 

Nevertheless, even in the case c = * we can prove that there is always only 
a finite number of obstructions. If a = 1, the best bounds we currently have for 
the size of minimal obstructions are where t = 0{£) if b = *, and t = 0{£'^) 
if 6 = 0. If the remaining case a = *, 6 = 0, we have the bound 2(fc -I- 

We will return to these results in a future paper. 
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Abstract. For several graph theoretic parameters such as vertex cover 
and dominating set, it is known that if their values are bounded by k 
then the treewidth of the graph is bounded by some function of k. This 
fact is used as the main tool for the design of several fixed-parameter 
algorithms on minor-closed graph classes such as planar graphs, single- 
crossing-minor-free graphs, and graphs of bounded genus. In this paper 
we examine the question whether similar bounds can be obtained for 
larger minor-closed graph classes, and for general families of parameters 
including all the parameters where such a behavior has been reported so 
far. 

Given a graph parameter P, we say that a graph family T has the 
parameter-treewidth property for P if there is a function f(p) such that 
every graph G £ J- with parameter at most p has treewidth at most 
f(p). We prove as our main result that, for a large family of parame- 
ters called contraction-bidimensional parameters, a minor-closed graph 
family T has the parameter-treewidth property if T has bounded lo- 
cal treewidth. We also show “if and only if” for some parameters, and 
thus this result is in some sense tight. In addition we show that, for 
a slightly smaller family of parameters called minor-bidimensional pa- 
rameters, all minor-closed graph families T excluding some fixed graphs 
have the parameter-treewidth property. The bidimensional parameters 
include many domination and covering parameters such as vertex cover, 
feedback vertex set, dominating set, edge-dominating set, g-dominating 
set (for fixed g). We use these theorems to develop new fixed-parameter 
algorithms in these contexts. 



1 Introduction 

The last ten years has witnessed the rapid development of a new branch of com- 
putational complexity, called parameterized complexity; see the book of Downey 

* The last author was supported by EC contract IST-1999-14186: Project ALCOM- 
FT (Algorithms and Complexity) - Future Technologies and by the Spanish CICYT 
project TIC-2002-04498-C05-03 (TRACER) 

Farach-Colton (Ed.): LATIN 2004, LNCS 2976, pp. 109-118, 2004. 

Springer- Verlag Berlin Heidelberg 2004 
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& Fellows [14]. Roughly speaking, a parameterized problem with parameter k 
is fixed-parameter tractable (FPT) if it admits an algorithm with running time 
(Here / is a function depending only on k and |/| is the size of the 

instance.) 

A celebrated example of a fixed-parameter tractable problem is Vertex 
Cover, asking whether an input graph has at most k vertices that are incident 
to all its edges. When parameterized by k, the fc- Vertex Cover problem ad- 
mits a solution as fast as 0{kn-\- 1.285^) [7]. Moreover, if we restrict /c-Vertex 
Cover to planar graphs then it is possible to design FPT-algorithms where 
the contribution of k in the non-polynomial part of their complexity is subex- 
ponential. The first algorithm of this type was given by Alber et al. (see [2]). 
Recently, Fomin and Thilikos reported a 0(fc‘*-|-2'^'^'^-|-fcn) algorithm for planar 
/c-Vertex Cover [19]. 

However, not all parameterized problems are fixed-parameter tractable. A 
typical example of such a problem is Dominating Set, asking whether an in- 
put graph has at most k vertices that are adjacent to the rest of the vertices. 
When parameterized by k, the fc-DOMiNATiNG Set Problem is known to be 
IF [2] -complete and thus it is not expected to be fixed-parameter tractable. In- 
terestingly, the fixed-parameter complexity of the same problem can be distinct 
for special graph classes. During the last five years, there has been substantial 
work on fixed-parameter algorithms for solving the fc-DOMiNATiNG SET on planar 
graphs and different generalizations of planar graphs. For planar graphs Downey 
and Fellows [14], suggested an algorithm with running time 0(ll‘^n). Later the 
running time was reduced to 0(8‘^n) [2]. An algorithm with a sublinear expo- 
nent for the problem with running time 0(4®''^^n) was given by Alber et al. [1]. 
Recently, Kanj & Perkovuc [23] improved the running time to 0(2^’^'^n) and 
Fomin & Thilikos to [18]. The fixed-parameter algorithms 

for extensions of planar graphs like bounded-genus graphs and graphs excluding 
single-crossing graphs as minors are introduced in [11,9,15]. 

In the majority of these results, the design of FPT algorithms for solving 
problems such as /c-Vertex Cover or /c-Dominating Set in a sparse graph 
class T is based on the following lemma: every graph G \n T where the value 
of the parameter is at most p has treewidth bounded by /(p), where / is a 
function depending only on T . With some work (sometimes very technical), 
a tree decomposition of width 0(/(p)) is constructed and standard dynamic- 
programming techniques on graphs of bounded treewidth are implemented. Of 
course this method can not be applied for any graph class T . For instance, the 
n-vertex complete graph AT„ has a dominating set of size one and treewidth 
equal to n — 1. So the emerging question is: For which (larger) graph classes 
and for which parameters can the “bounding treewidth method” be applied? In 
this paper we give a complete characterization of minor-closed graph families for 
which the aforementioned “bounding treewidth method” can be applied for a 
wide family of graph parameters. For a given parameter P, we say that a graph 
family T has the parameter-treewidth property for P if there is a function f{p) 
such for every graph G & T where P{G) < p implies that G has treewidth 




Bidimensional Parameters and Local Treewidth 



111 



at most f{p). Our main result is that for a large family of parameters called 
contraction-hidimensional parameters, a minor-closed graph family T has the 
parameter-treewidth property if T has bounded local treewidth. Moreover, we 
show that the inverse is also correct if some simple condition is satisfied by P . In 
addition we show that, for a slightly smaller family of parameters called minor - 
bidimensional parameters, every minor-closed graph family T excluding some 
fixed graph has the parameter-treewidth property. The bidimensional-parameter 
family includes many domination and covering parameters such as vertex cover, 
feedback vertex set, dominating set, edge-dominating set, and g-dominating set 
(for fixed q) (see also [11] for more examples). 

The proof of the main result uses the characterization of Eppstein for minor- 
closed families of bounded local treewidth [16] and Diestel et al.’s modification 
of the Robertson & Seymour excluded-grid-minor theorem [13]. In addition, the 
proof is constructive and can be used for constructing fixed-parameter algorithms 
to decide bidimensional parameters on minor-closed families of bounded local 
treewidth. In this sense, we extend to fixed-parameter algorithms the result of 
Frick & Grohe [21] that, for each property <j) definable in first-order logic, and 
for each class of minor-closed graphs of bounded local treewidth, there is a (non- 
fixed-parameter) 0(n^“'"'^)-time algorithm deciding whether a given graph has 
property (j). 

A preliminary and special case of our result, concerning only the dominating 
set parameter, appeared in [20] with a different and more complicated proof. 
Also, another proof of the same result appeared in [10]. In this paper we present 
shorter and more elegant proofs of the combinatorial results of [20] and [10] while 
we extend their applicability to general families of parameters. 

2 Definitions and Preliminary Results 

Let G be a graph with vertex set V{G) and edge set E{G). We let n denote the 
number of vertices of a graph when it is clear from context. For every nonempty 
W C V (G), the subgraph of G induced by W is denoted by G[1T]. We define the 
q-neighhorhood of a vertex v G V{G), denoted by A^g[u], to be the set of vertices 
of G at distance at most q from v. Notice that v G Nq[v]. We put 
We also often say that a vertex v dominates subset S C V{G) if A^g^] 2 S. 

Given an edge e = {x, y} of a graph G, the graph G/e is obtained from G by 
contracting the edge e; that is, to get G/e we identify the vertices x and y and 
remove all loops and duplicate edges. A graph H obtained by a sequence of edge 
contractions is said to be a contraction of G. A graph iL is a minor of a graph 
G if is the subgraph of a contraction of G. We use the notation H ^ G [resp. 

H G] for H a minor [a contraction] of G. A family (or class) of graphs T 

is minor-closed ii G G P implies that every minor of G is in T . A minor-closed 
graph family P \s H -minor-free ii H ^ P. 

The m X m grid is the graph on {1, 2, . . . ,m^} vertices {{i,j) ■ m} 

with the edge set 



|f-i'| + |j-/| = l}. 
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For i G {1,2,..., m} the vertex set j G (1, 2, . . . , to}, is referred as the ith 
row and the vertex set (j,i), j € (1,2, . . . ,to}, is referred to as the ith column 
of the m X m grid. The vertices (i,j) of the m x m grid with i G {1,to| or 
j G {1,to| are called boundary vertices and the rest of the vertices are called 
non-boundary vertices. 

The notion of treewidth was introduced by Robertson and Seymour [25]. 
A tree decomposition of a graph G is a pair {{Xi \ i G /},T = (I,F)), with 
{Xi I i G /} a family of subsets of V{G) and T a tree, such that 

1. Uie,Ai = R(G); 

2. for all {r’,^} G E{G), there is an t G / with v,w £ Xp, and 

3. for all io,iiG 2 G I, if ii is on the path from io to in T, then Xi ^ C Xi^ . 

The width of the tree decomposition {{Xi \ i £ I},T = {I, F)) is max^g/ |Aj| — 1. 
The treewidth tw(G) of a graph G is the minimum width of a tree decomposition 
of G. 

We need the following facts about treewidth. The first fact is trivial. 

— For any complete graph AT„ on n vertices, tw(AT„) = n — 1. 

The second fact is well known but its proof is not trivial. (See e.g., [12].) 

— The treewidth of the to x to grid is to. 

The next fact we need is the improved version of the Robertson & Seymour 
theorem on excluded grid minors [26] due to Diestel et al. [13]. (See also the 
textbook [12].) 

Theorem 1 ([13]). Let r,m be integers, and let G be a graph of treewidth at 
least to"*^’’ (™+2) ^ Then G eontains either or the m x m grid as a minor. 

A parameter P is any function mapping graphs to nonnegative integers. The 
parameterized problem associated with P asks, for a fixed k, whether P{G) < k 
for a given graph G. 

A parameter P is g{r)-minor-bidimensional if (i) contracting an edge, delet- 
ing an edge, or deleting a vertex in a graph G cannot increase P{G), and (ii) 
there exists a function g such that, for the r x r grid R, P{R) > g{r). Similarly, 
a parameter P is g{r)-contraction-bidimensional if (i) contracting an edge in a 
graph G cannot increase P{G), and (ii) there exists a function g such that, for 
any r x r augmented grid R of constant span, P{R) > g{r)^. Here an r x r aug- 
mented grid of span s is an r x r grid with some extra edges such that each vertex 
is attached to at most s non-boundary vertices of the grid. We assume that g{r) 
is monotone and invertible for r > 0. We note that a g(r)-minor-bidimensional 
parameter is also a g(r)-contraction-bidimensional parameter. One can easily 

^ Closely related notions of bidimensional parameters are introduced by the authors 
in [9]. 
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observe that many parameters such as minimum sizes of dominating set, q- 
dominating set (distance g-dominating set for a fixed q), vertex cover, feedback 
vertex set, and edge-dominating set (see exact definitions of the corresponding 
parameters in [11]) are 6>(r^)-minor- or 6*(r^)-contraction-bidimensional param- 
eters. Another example of contraction-bidimensional parameter is the minimum 
length in TSP (Travelling salesman problem), i.e. the smallest number of edges 
in a walk containing all vertices of a graph. 

Here, we present a theorem for minor-bidimensional parameters on general 
minor-closed classes of graphs excluding some fixed graphs, whose intuition plays 
an important role in the main result of this paper. 

Theorem 2. If a g{r)-minor-hidimensional parameter P on an H -minor-free 
graph G has value at most p, then tw(G) < 2^^^ bp)iogs hp)), (The constant 
in the O notation depends on H.) 

Proof. By Theorem 1, since G is iL-minor-free (and thus Ar|y(-^)|-minor-free), 
we know if m is the largest integer such that tw(G) > then G 

has an TO X TO grid as a minor. Since P is (/(r)-minor-bidimensional, p > g{m) 
and thus we obtain the desired bound. 

Theorem 2 can be applied for minor-bidimensional parameters such as vertex 
cover or feedback vertex set. 

The notion of local treewidth was introduced by Eppstein [16] (see also [22]). 
The local treewidth of a graph G is 

ltw(G,r) = max{tw(G[A^Q[u]]) : v G V{G)}. 

For a function f : N ^ N we define the minor-closed class of graphs of bounded 
local treewidth 

£(/) = {G: ViL ^ G Vr > 0, ltw{H,r) < f{r)}. 

Also we say that a minor-closed class of graphs C has bounded local treewidth 
if C C £(/) for a function /. 

Well-known examples of minor-closed classes of graphs of bounded local 
treewidth are graphs of bounded treewidth, planar graphs, graphs of bounded 
genus, and single-crossing-minor-free graphs. 

Many difficult graph problems can be solved efficiently when the input is 
restricted to graphs of bounded treewidth (see e.g., Bodlaender’s survey [5]). 
Eppstein [16] made a step forward by proving that some problems like subgraph 
isomorphism and induced subgraph isomorphism can be solved in linear time 
on minor-closed graphs of bounded local treewidth. Also the classic Baker’s 
technique [4] for obtaining approximation schemes on planar graphs for different 
NP-hard problems can be generalized to minor-closed families of bounded local 
treewidth. (See [16] for a generalization of these techniques.) 

An apex graph is a graph G such that, for some vertex v (the apex), G — v is 
planar. The following result is due to Eppstein [16]. 

Theorem 3 ([16]). Let T he a minor-closed family of graphs. Then T is of 
bounded local treewidth if and only if P does not contain all apex graphs. 
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3 Main Theorem 

Due to space restriction we omit the proofs of the following two combinatorial 
lemmas. 

Lemma 1. Suppose we have a m x m grid H and a subset S of vertices in the 
central {m — 2k) x (m — 2k) subgrid H' , where s = [S'! and k = • Then H 

has as a minor the k x k grid R such that each vertex in R is a contraction of 
at least one vertex in S and other vertices in H . 

Lemma 2. Let G G /!(/) be a graph containing the mxm grid H as a subgraph, 
m > 2k, where k = /(2) + 1. Then the central {m — 2k) x {m — 2k) subgrid H' 
has the property that every vertex v G V (G) is adjacent to less than k'^ vertices 
in H' . 

Now we are ready to present the main result of this paper. 

Theorem 4. Let P be a g{r)-contraction-bidimensional parameter. Then for 
any function / : N — >■ N and any graph G G C{f) on which parameter P has 
value at most p, we have tw(G) < 2'^^® bp)ios9 bp)), (The constant in the O 
notation depends on /(I) and f{2).) 

Proof. Let r = /(I) + 1 and k = /(2) + 1. Let G G C{f) be a graph on which 
the parameter P has value p. Let m be the largest integer such that tw(G) > 
^4r (m+ 2 )^ Without loss of generality, we assume G is connected, and m > 2k 
(otherwise, tw(G) is a constant since both r and k are constants.) Then G 
has no complete graph as a minor. By Theorem 1, G contains an m x 
m grid H as & minor. Thus there exists a sequence of edge contractions and 
edge/vertex deletions reducing G to H. We apply to G the edge contractions 
from this sequence, we ignore the edge deletions, and instead of deletion of a 
vertex v, we only contract v into one of its neighbors. Call the new graph G', 
which has the mxm grid H as a subgraph and in addition V (G') = V{H). Since 
parameter P is contraction-bidimensional, its value on G' will not increase. By 
Lemma 2, we know that the central (m — 2k) x {m — 2k) subgrid H' of H has 
the property that every vertex v G V(G') is adjacent to less than k'^ vertices in 
H'. 

Now, suppose in graph G' , we further contract all 2k boundary rows and 2k 
boundary columns into two boundary rows and two boundary columns (one on 
each side) and call the new graph G". Note that here G" and H' have the same 
set of vertices. The degree of each vertex of G" to the vertices that are not on the 
boundary is at most (fc + 1)^A:^, which is a constant since k is a constant. Here 
the factor (fc+ 1)^ is for the boundary vertices each of which is obtained by con- 
traction of at most (fc-l- 1)^ vertices. Again because parameter P is contraction- 
bidimensional, its value on G" does not increase and thus it is at most p. On 
the other hand, since the parameter is 5 (r)-contraction-bidimensional, its value 
on graph G" is at least g{m — 2k). Thus g~^{p) > m — 2k, so m = 0{g~^{p)). 
Therefore, the treewidth of the original graph G is at most 2'^^® (p)i°g9 (p)) 
as desired. 
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A direct corollary of Theorem 4 is the following. 

Lemma 3. Let P he a contraction-hidimensional parameter. A minor-closed 
graph class T has the parameter-treewidth property for P if P is of hounded 
local treewidth. 

The apex graphs Ai, i = 1,2,3, ... , are obtained from the ixi grid by adding 
a vertex v adjacent to all vertices of the grid. It is interesting to see that, for a 
wide range of parameters, the inverse of Lemma 3 also holds. 

Lemma 4. Let P he any contraction-hidimensional parameter where P{Ai) = 
0(1) for any i > 1. A minor-closed graph class T has the parameter-treewidth 
property for P only if P is of hounded local treewidth. 

Proof. The proof follows from Theorem 3. The apex graph Ai, has diameter 
< 2 and treewidth > L So a minor-closed family of graphs with the parameter- 
treewidth property for P cannot contain all apex graphs and hence it is of 
bounded local treewidth. 

Typical examples of parameters satisfying Lemmas 3 and 4 are dominating 
set and its generalization g-dominating set, for a fixed constant q (in which 
each vertex can dominate its q-neighborhood) . These parameters are 0(r^)- 
contraction-bidimensional and their value is 1 for any apex graph Ai,i > 1. 

We can strengthen the “if and only if” result provided by Lemmas 3 and 4 
with the following lemma. We just need to use the fact that if the value of P is 
less than the value of P' then the parameter-treewidth property for P implies 
the parameter-treewidth property for P' as well. 

Lemma 5. Let P he a parameter whose value is lower hounded hy some 
contraction-hidimensional parameter and let P{Ai) = 0(1) for any i>l. Then 
a minor-closed graph class P has the parameter-treewidth property for P if and 
only if P is of hounded local treewidth. 

Lemma 5 can apply for parameters that are not necessarily contraction- 
bidimensional. As an example we mention the clique-transversal number of a 
graph, i.e., the minimum number of vertices meeting all the maximal cliques of 
a graph. ^ It is easy to see that this parameter always exceeds the domination 
number (the size of a minimum dominating set) and that any graph in Ai has a 
clique-transversal set of size 1. 

Another application is the II -domination number, i.e., the minimum cardi- 
nality of a vertex set that is a dominating set of G and satisfies some property 
LI in G. If this property is satisfied for any one-element subset of V (G) then we 
call it regular. Examples of known variants of the parameterized dominating set 
problem corresponding to the 77-domination number for some regular property 

^ The clique-transversal number is not contraction-hidimensional because an edge con- 
traction may create a new maximal clique and the value of the clique-transversal 
number may increase. 
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n are the following parameterized problems: the independent dominating set 
problem, the total dominating set problem, the perfect dominating set problem, 
and the perfect independent dominating set problem (see the exact definitions 
in [1]). 

We summarize the previous observations with the following: 

Corollary 1. Let P be any of the following parameters: the minimum cardinal- 
ity of a dominating set, the minimum cardinality of a q-dominating set (for any 
fixed q), the minimum cardinality of a clique-transversal set, or the minimum 
cardinality of a dominating set with some regular property II. A minor-closed 
family of graphs T has the parameter-treewidth property for P if and only if T is 
of hounded local treewidth. The function f{p) in the parameter-treewidth property 

ig 20(Vpiogp) ^ 

4 Algorithmic Consequences and Concluding Remarks 

Courcelle [6] proved a meta-theorem on graphs of bounded treewidth; he showed 
that, if (/) is a property of graphs that is definable in monadic second-order logic, 
then (j) can be decided in linear time on graphs of bounded treewidth. Frick 
and Grohe [21] extended this result to graphs of bounded local treewidth; they 
showed that, for each property (j) that is definable in first-order logic and for each 
minor-closed class of graphs of bounded local treewidth, there is an 0(n^+^)- 
time algorithm deciding whether a given graph has property 4>. However Frick & 
Grohe’s proof is not constructive. It uses a transformation of a first-order logic 
formula into a “local formula” according to Gaifman’s theorem and even the 
complexity of this transformation is unknown. 

Using Theorems 2 and 4, we can extend the result of Frick & Grohe for 
fixed-parameter algorithms and show that any minor-bidimensional property 
that is solvable in polynomial time on graphs of bounded treewidth is also fixed- 
parameter tractable on general minor-closed graph families excluding some fixed 
graphs, and similarly for any contraction-bidimensional property on minor-closed 
graph families of bounded local treewidth. In contrast to the work of Frick & 
Grohe, the running time of our algorithm is explicit. 

Theorem 5. Let P he a parameter such that, given a tree decomposition of 
width at most w for a graph G, the parameter can he decided in h{w)n^^^^ 
time. Now, if P is a g{r) -minor-bidimensional parameter and G belongs to 
a minor-closed graph family excluding some fixed graphs, or P is a g{r)- 
contraction-hidimensional parameter and G belongs to a minor-closed fam- 
ily of graphs of hounded local treewidth, then we can decide P on G in 
;,(20(s-bfe)iogg-bfc)))„0(i) time. 

Proof. The algorithm is as follows. First we check whether tw(G) is in 
20(3 (fe)iogg (fc))^ gy Theorems 2 and 4, if it is not, parameter P has value 
more than k on graph G. This step can be performed by Amir’s algorithm [3], 
which for a given graph G and integer to, either reports that the treewidth of G 
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is at least w, or produces a tree decomposition of width at most (3+ |)w in time 
0(2^ ®®®“ n3 0^3 log^ n). Thus by using Amir’s algorithm we can either compute a 

tree decomposition of G of size 2*^^® (fe)iog9 in time 2^°^“* ^ *'n3+'^, 

or conclude that the treewidth of G is not in 2*^^® (fc)i°g9 (fe)). 

Now if we find a tree decomposition of the aforementioned width, we can 
decide P on G in time ^(k)iogg ^(k))'^^o{i) Xhe running time of 

this algorithm is the one mentioned in the statement of the theorem. 

For example, let G be a graph from a minor-closed family T of bounded local 
treewidth. Since the dominating set of a graph with a given tree decomposition 
of width at most uj can be computed in time 0(2^“n) [1], Theorem 5 gives an 
algorithm which either computes a dominating set of size at most p, or concludes 
that there is no such a dominating set in same re- 

sult holds also for computing the minimum size of a g-dominating set. Indeed, 
Theorem 5 can be applied because the ^-dominating set of a graph with a given 
tree decomposition of width at most w can be computed in time [8]. 

Also, algorithms on graphs of bounded treewidth for clique-transversal set, and 
7T-domination set appeared in [24] and [1] respectively. Using these algorithms, 
and the fact that all these parameters are lower bounded by the domination 
number, the methodology of the proof of Theorem 5 can give algorithmic results 
for clique-transversal set and iT-domination set with the same running times as 
in the case of dominating set (i.e., 

Finally, we mention some open problems. For planar graphs and for some 
of their extensions, it is known that for any graph G from these classes with 
dominating set of size at most p, we have tw(G) = It is tempting to 

ask if such a much smaller bound holds for all minor-closed families of bounded 
local treewidth. This will provide subexponential fixed-parameter algorithms on 
graphs of bounded local treewidth for the dominating set problem. 
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Abstract. We show that I vertex disjoint paths between I pairs of ver- 
tices can be found in linear time for co-graphs but is NP-complete for 
graphs of NLC-width at most 4 and clique-width at most 7. This is the 
first inartificial graph problem known to be NP-complete on graphs of 
bounded clique- width but solvable in linear time on co-graphs and graphs 
of bounded tree-width. 



1 Introduction 

The clique-width of a graph is defined by a composition mechanism for vertex- 
labeled graphs [COOO]. The operations are the vertex disjoint union, the addition 
of edges between vertices controlled by a label pair, and the relabeling of vertices. 
The clique-width of a graph G is the minimum number of labels needed to define 
it. The NLC-width of a graph is defined by a composition mechanism similar 
to that for clique- width [Wan94]. Every graph of clique- width at most k has 
NLC-width at most k and every graph of NLC-width at most k has clique- 
width at most 2k [Joh98]. The only essential difference between the composition 
mechanisms of clique-width bounded graphs and NLC-width bounded graphs 
is the addition of edges. In an NLC-width composition the addition of edges is 
combined with the union operation. This union operation applied to two graphs 
G and J is controlled by a set S of label pairs such that for every pair (a, b) G S 
all vertices of G labeled by a will be connected with all vertices of J labeled 
by b. We use both notations, because it is sometimes much more comfortable to 
use NLC-width expressions instead of clique-width expressions and vice versa, 
respectively. 

Clique-width and NLC-width bounded graphs are particularly interesting 
from an algorithmic point of view. A lot of NP-complete graph problems can 
be solved in polynomial time for graphs of bounded clique-width if the clique- 
width or NLC-width expression for the graph is explicitly given. For example, 
all graph properties which are expressible in monadic second order logic with 
quantifications over vertices and vertex sets (MSOi-logic) are decidable in lin- 
ear time on clique-width bounded graphs [CMROO]. The MSOi-logic has been 
extended by counting mechanisms which allow the expressibility of optimization 
problems concerning maximal or minimal vertex sets [CMROO] . All graph prob- 
lems expressible in extended MSOi-logic can be solved in polynomial time on 
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clique-width bounded graphs. Furthermore, there are also a lot of NP-complete 
graph problems which are not expressible in extended MSOi-logic like Hamil- 
tonicity, partition problems, and bounded degree subgraph problems but which 
can also be solved in polynomial time on clique- width bounded graphs [EGWOl, 
KR01,Tod03], 

If a graph G has clique-width (NLC-width) at most k then the edge com- 
plement G has clique-width at most 2k (NLC-width at most k) [COOO,Wan94]. 
Distance hereditary graphs have clique-width at most 3 [GR99]. The set of all 
graphs of clique-width at most 2 or NLC-width 1 is the set of all labeled co- 
graphs. Brandstadt et al have analyzed the clique-width of graphs defined by 
forbidden one- vertex extensions of P 4 [BDLM02]. The clique- width and NLC- 
width of permutation graphs, interval graphs, grids and planar graphs is not 
bounded [GR99]. An arbitrary graph with n vertices has clique- width at most 
n — r, if 2’’ < n — r, and NLC-width at most |"|] [Joh98]. Every graph of tree- 
width^ at most k has clique- width at most 3 • 2^“^ [CROl]. In [GWOO], it is 
shown that every graph of clique-width or NLC-width k which does not contain 
the complete bipartite graph AT„_„ for some n > 1 as a subgraph has tree-width 
at most 3fc(n — 1) — 1. The recognition problem for graphs of clique-width or 
NLC-width at most k is still open for A: > 4 and fc > 3, respectively. Clique- 
width of at most 3 is decidable in polynomial time [CHL+00]. NLC-width of at 
most 2 is decidable in polynomial time [JohOO]. Clique- width of at most 2 and 
NLC-width 1 is decidable in linear time [CPS85] . The clique- width of tree- width 
bounded graphs is also computable in linear time [EGW03]. 

In this paper, we analyze the problem of finding vertex disjoint paths. Given 
I pairs of vertices (si,ti), . . . , (spt;) and I integers ri, . . . ,ri, we consider the 
problem of finding ri paths between Si and ti for i = 1 , . . . ,l whose inner vertices 
are all distinct. The vertex disjoint paths problem can be solved in polynomial 
time if I is fixed, i.e., not part of the input, and = 1 for i = 1, . . . ,l [RS95]. 
It is NP-complete for ^ = 2 if ri and V 2 are unbounded [EIS76]. It is also NP- 
complete if the number I of vertex pairs is unbounded and = 1 for i = 1 , . . . , Z 
[MP95]. 

In Section 3, we show that the vertex disjoint paths problem for co-graphs 
can be solved in linear time for ri = 1, 1 < i < I, and in polynomial time for 
unbounded r^s. 

Nishizeki, Vygen, and Zhou have shown in [NVZOl] that the edge disjoint 
paths problem is NP-complete for graphs of tree- width at most 2. The edge 
disjoint paths problem for a graph G can be solved with an algorithm for the 
vertex disjoint paths problem on the line graph of G. The line graph of a graph 
G has a vertex for every edge of G and an edge between two vertices if the 
corresponding edges of G are adjacent. In Section 4, we show that the line graph 
of a graph of tree- width k has clique- width at most 2fc-|-3 and NLC-width at most 
A: -|- 2. This implies our main result that the vertex disjoint paths problem where 
all Ti = 1 is NP-complete for graphs of clique-width at most 7 and NLC-width 
at most 4. This is the first inartificial NP-complete graph problem shown to be 
NP-complete for graphs of bounded clique-width but solvable in linear time for 

^ See Robertson and Seymour [RS 86 ] for a definition of tree-width. 
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co-graphs and graphs of bounded tree- width [Sch94] . It is also the first problem 
that separates co-graphs and clique-width bounded graphs from a complexity 
point of view. By inartificial we mean that the problem is not exclusively defined 
for the purpose to be NP-complete for clique-width bounded graphs and solvable 
in polynomial time for co-graphs and tree- width bounded graphs. 



2 Preliminaries 



Let \k] := {!,... , fc} be the set of all integers between 1 and k. We work with 
finite undirected labeled graphs G = (Vg, labc), where Vq is a finite set of 
vertices labeled by some mapping labc '■ V ^ [k] and Eq Q {{m, w} \ u,v & 
Vg, u yf u} is a finite set of edges. A labeled graph J = {Vj,Ej,\dhj) is a 
subgraph of G if Vj C Vq, Ej C Eq and labj('u) = labclu) for all u G Vj. J is 
an induced subgraph of G if additionally Ej = {{u,?;} G Eg \ u,v € Vj}. The 
labeled graph consisting of a single vertex labeled by a G [k] is denoted by *a . 

Definition 1 (Clique-width, [COOO]). Let k be some positive integer. The 
class CWk of labeled graphs is recursively defined as follows. 

1. The single vertex graph »a for some a G [A:] is in CWk. 

2. Let G = {Vg, Eg, Za&c) G CWk and J = (Vj, Ej, labj) G CWk be two vertex 
disjoint labeled graphs. Then G© J := {V',E', lab') defined by V := VgUVj, 
E' := Eg U Ej, and 



lab' {u) 



f labG(u) ifueVG 
\ labj(u) ifueVj 



Vuev' 



is in CWk. 

3. Let a,b G [fc] be two distinct integers and G = {Vg, EG,labG) G CWk be a 
labeled graph then 

a) pa^b{G) := {Vg, EG,lab') defined by 



lab' {u) 



j labG{u) if labG{u) yf a 
(5 if labG{u) = a ’ 



is in CWk and 

b) rja,b{G) := {VG,E',labG) defined by 

E' := E U {{m, ri} I m, u G Vg, u =/= v, lab{u) = a, lab{v) = b} 
is in CWk. 



Definition 2 (NLC- width, [Wan94]). Let k be some positive integer. The 
class NLCk of labeled graphs is recursively defined as follows. 

1. The single vertex graph *a for some a G [k] is in NLCk. 
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2. Let G = {Vg, Eg, labc) G NLCk and J = {Vj,Ej,lahj) G NLCk be two 
vertex disjoint labeled graphs and S C [k]'^ be a relation, then G Xs J ■= 
{V , E', lab') defined by V := Vg U Vj, 

E' := Eg^J EjVJ {{u,v} \u GVg, vGVj, {labG{u),labj{v)) G S}, 



and 



lab' {u) 



f labG(u) ifuGVG 
\ labj{u) if uGVj 



'iuGV' 



is in NLCk- 

3. Let G = (Vg, EgjMg) G NLCk and R : [k] ^ [k] be a function, then 
or{G) := (Vg,Eg, lab') defined by lab'(a) := R{labG(a)), Vu G Vg is in 
NLCk- 



The clique-width (NLC-width) of a labeled graph G is the least integer k 
such that G G CWk (G G NLCfc, respectively). An expression X built with 
the operations •a,®, pa^b,Va,b for integers a,b G [k] is called a clique-width k- 
expression. An expression X built with the operations •a,'>^s,°R for a G [k], 
S C [kY, and R : [k] ^ [k] is called an NLC-width k-expression. The graph 
defined by expression X is denoted by val(A). 

A path p of length r — 1 in a graph G = (Vg, A< 5 , labc) is a sequence of r 
vertices p = (ui,... ,Ur) such that {ui,Ui +i} G Eg for i = 1 , . . . , r — 1 is an 
edge of G. Two paths p = (mi, . . . , Ur) and q = (vi, . . . ,Vr>) are vertex disjoint if 
{u 2 , ■ ■ ■ , Ur-i} n {vi , ... , Wr'} = 0 and {ui, . . . , Ur} n {v 2 , ■ ■ ■ , Ur'-i} = 0- That 
is, the inner vertices of p do not occur in q, and vice versa. 

We analyze the following decision problem. 

PROBLEM: Vertex Disjoint Paths 

INSTANCE: Graph G = (Vg, Eg, labc), I vertex pairs (si,ti),... ,{si,ti), all 
si, . . . ,si and ti, . . . ,ti distinct, and I positive integers ri, . . . , n. 

QUESTION: Are there paths between Si and ti for i = I, . . . , / such that all 
paths are mutually vertex disjoint. 



3 Polynomial Time Solutions 

Let G be a co-graph defined by some NLC-width 1-expression X. Expression X 
can be found for a given graph G in linear time using any linear time recognition 
algorithm for co-graphs that computes the co-tree for G, see for example [CPS85] . 

The vertices of the vertex pairs (si, G), . . . , {si, U) are called terminal vertices 
and all other vertices are called free vertices. We assume that the terminal ver- 
tices are explicitely specified in the NLC-width 1-expression X. That is, we know 
which sub-expression *i of X represents terminal vertex st or G for 1 < i < L 

It is well known that a graph is a co-graph if and only if it has no induced P 4 , 
i.e., it has no induced path of length 3. That is, to solve the vertex disjoint paths 
problem for co-graphs we only have to look for paths that consist of exactly two 
or three vertices, the start vertex Si, at most one free vertex Ui, and the target 
vertex fi, for 1 < i < 1. 
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Theorem 1. The vertex disjoint paths problem, for ri = 1, i = 1, . . . ,1, is 
decidable in linear time for co-graphs. 

Proof. If two terminal vertices Si,ti, 1 < i < I, are adjacent, we can remove 
them from G by modifying X. This can be done in linear time. The remaining 
graph is still a co-graph. Hence, we can assume that all Si,ti, 1 < i < I, are 
non-adjacent. 

The paths (si, Ui,ti) of length two, where Si and U are not adjacent in G, can 
only be constructed in a composition step of the form Y Xpi Z, where either 
val(T) or val(Z) contains both terminal vertices Sj, ti and the other graph val(Z) 
or val(T), respectively, contains the free vertex Ui. Since all vertices have the 
same label, label 1 , every free vertex of val(T) will be connected by operation 
Y X{(i 1 )} Z with every terminal vertex of val(Z), and vice versa. 

Let G' be the induced subgraph of G represented by a subexpression X' of 
X. Then let 

1. R(X') be the number pairs Si,ti, I <i <1, contained in G', 

2. F(X') be the number of free vertices in G', and 

3. M {X') be the maximal number of vertex disjoint paths (si, Ui, tf) in subgraph 
G", for free vertices ui, I <i <1. 

By definition, graph G defined by X has a solution for the vertex disjoint 
paths problem if and only if R{X) = M{X). The main part now is to show that 
all R{X'), F{X'), and M{X') are computable in linear time. This can be done 
recursively as follows: 

- Let X' = * 1 . 

If the single vertex u of val(X') is a free vertex then 
R{X') = 0, F{X') = 1, and M{X') = 0 
otherwise, if u is a terminal vertex, then 
R{X') = 0, F{X') = 0, and M{X') = 0. 

~ Let X' = Y X 0 Z. (Operation X 0 does not create any new edge.) 

Let Ry.z be the number of pairs Si,ti, 1 < i < I, such that val(T) contains 

terminal vertex Si or ti and val(Z) contains the other terminal vertex ti or 

Si, respectively. Then 

R{X') =R{Y) + R{Z) + Ry.z, 

f\x') = f\y) + f\z), and 

M{X') = M{Y) + M{Z). 

— Let X' = Y X{(i 1 )} Z. 

Then 

R{X') = R{Y) + R{Z) + Ry.z, 
f\x') = f\y) + f\z), and 

M{X') = M{Y) + M{Z) + min{i?(T) - M(Y), F{Z) - M{Z)} 

+ mm{R{Z) - M{Z),F{Y) - M{Y)}. 

F{Z) — M{Z) free vertices of val(Z) can be used to realize some of the still 
missing R{Y) — M{Y) paths in val(F), and vice versa. 

This computation can be done in linear time by traversing bottom-up the 
NLC-width 1-expression. The necessary values Ry,z at the operations X 0 and 
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X{(i,i)} can be computed in a preprocessing. We just have to compute in the 
expression tree of X the nearest common ancestor for every pair (si,ti). Such 
a preprocessing can be done for all pairs (si,ti) together in linear time with 
respect to the size of the expression tree of X, see [AGKR02]. 

Theorem 2. The vertex disjoint paths problem is decidable in polynomial time 
for co-graphs. 

Proof. In a first step, we decrement by one if Si and U are adjacent in G. 
Then we define a bipartite graph H as follows. H has a vertex Wij for 1 < i < I 
and 1 < j < Vi and a vertex Vk for every free vertex Uk of G. Vertex Vk of H is 
connected with all vertices Wij, 1 < i < I, 1 < j < ri, in H ii and only if free 
vertex Uk of G is adjacent to Sj and ti in G. 

Graph H is bipartite and can be constructed in polynomial time. Graph G 
has Vi paths of length 2 between Si and 1 < t < /, if and only if H has a 
matching of size ^ maximum matching in bipartite graphs with n 

vertices and m edges can be found in time 0{^/n ■ m ■ log(^)/ log(n)) [FM91]. 

4 NP-Completeness 

If we consider edge disjoint paths instead of vertex disjoint paths, then we get 
the edge disjoint paths problem. The edge disjoint paths problem for a graph G 
can be solved with an algorithm for the vertex disjoint paths problem applied 
to the line graph of G. The line graph of a graph G = {Vc,Eg) has a vertex for 
every edge of G and an edge between two vertices if the corresponding edges in 
G have a common vertex. The edge disjoint paths problem is NP-complete even 
for series parallel graphs, i.e. for graphs of tree-width at most 2 [NVZOl] and 
r* = 1, i = 1,... J. 

We now show that the line graph of a graph of tree-width k has NLG-width 
at most k 2. Graphs of tree- width at most k are also called partial k -trees. 
A partial k-tree is a subgraph of a k-tree which can recursively be defined as 
follows. The complete graph with k vertices is a fc-tree. If G is a fc-tree then 
the graph obtained by inserting a new vertex u and k edges between u and all 
vertices of a, k vertex complete subgraph of G is again a fc-tree. 

Theorem 3. The line graph of a partial k-tree has NLC-width at most k -\-2. 

Proof. It suffices to show that the line graph of a k-tree G has NLG-width at 
most k-\-2, because the line graph of every subgraph of G is an induced subgraph 
of the line graph of G, and the set of all graphs of NLG-width at most fc -I- 2 is 
closed under induced subgraphs. 

Let G = {Vg,Eg) be a fc-tree with n vertices. Let o = (t6i,... ,m„) be an 
order of the n vertices of G. Let N~{G, o, i) and A^+(G, o, i) for i = 1, . . . , n be 
the set of neighbors Uj of vertex Ui with j < i and j > i, respectively. That is, 

N~ {G,o,i) := {uj I {ui,Uj} G Eq A j < i} and 



A^+(G,o,z) := {uj I {uj,Ui} G Eq A j > i}. 
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A vertex order {u \, . . . , t6„) for G is called a perfect elimination order (PEO) for 
G if the vertices of N^{G, o, i) for i = 1, . . . ,n induce a complete subgraph of 
G. 

There is always a PEO o = {u \, . . . , u„) for G such that the vertices of every 
A^+(G, o, i) for i = 1, . . . ,n — k induce a k vertex complete subgraph and the 
vertices of every N~^(G,o,i) for i = n — k + 1, . . . ,n induce an n — t vertex 
complete subgraph of G. Here we can use, for example, the reverse order of the 
vertices from the recursive definition of the fc-tree. 

The structure of the fc-tree G can be characterized by the tree T{G,o) = 
{Vt, Et) defined as follows. Let o = (ui , . . . , be a perfect elimination order 
for G. 



Vt ■■= Vg Et ■■= {{ui, Uj} e Eg \ i <j A V/, i < f < j, {ui, uy) ^ Eg}- 

Graph T{G,o) is a tree, because every vertex Ui, i < n, of T{G,o) is incident 
with exactly one edge {ui,Uj} for j > i. 

Let col be a fc -I- 1-coloring of G, i.e., col : Vg — >■ [^ + 1] is a mapping with 
col(Mi) ^ col{uj) for all {ui, uj} £ Eg- It is easy to see that each k-tree is fc -I- 1 
colorable if we assign to Ui any color not used by the vertices of N^{G,o,i)- 
We now recursively define for i = 1, . . . , n an NLC- width fc -I- 2-expression 
as follows. Let N~{T{G,o),o,i) = ,Uj^} and N~^{G,o,i) = , 

ui^}. Note that ,Uj^} is defined by tree T{G,o) and {rtq,... ,«/,.} is 

defined by G. 

1 . If m = 1 then let Yi = Xj^ . If m > 1 then let Yi = Xj^ x / • • • x / Xj^ , where 
I = {(s, s) I s G [fc -I- 1]} is the identity between the labels 1, . . . , fc -I- 1. 
Graph val(li) is the disjoint union of m graphs val(AjJ,... ,val{Xj^), 
where equal labeled vertices from different graphs are joined by an edge. 
Note that relation I uses only the labels 1, . . . , fc -|- 1. The label fc -|- 2 is ex- 
clusively used for the vertices that will not be connected with other vertices 
in any further composition step. 

2. If r > 0 then let Zi be an NLG- width fc-|- 1-expression that defines a complete 
graph with r vertices labeled by co1(m/J, . . . ,col(u 4 ). Note that these r < fc 
labels are distinct and do not include the color of Ui- 

3. Then we define 



y-s Zi) if m > 0 and r > 0 

Zi if m = 0 and r > 0 

Ofl(hi) if m > 0 and r = 0 

where S = {(s, s) | s G [fc -I- 1]} U {(col(Mj), s) | s G [fc -I- 1]} and 

Js ifsyfcol(u*) 

^ fc -I- 2 if s = col('Ui) 

It remains to show that the NLG-width fc -|- 2-expression A„ defines the line 
graph of fc-tree G. 

By the definition of Zi for i = 1, . . . , n there is a one to one correspondence 
between the edges of G and the vertices of the graph defined by Xi. We say. 
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the vertex of the graph defined by Zi which is labeled by s represents the edge 
between ut and the unique vertex of o, i) colored by s. In this way, there 

is also a one to one correspondence between the edges of G and the vertices of 
the graph defined by although all labels are finally changed into label k + 2. 
The vertices in the graph defined by Xi which are labeled by some label of [k+ 1] 
will represent exactly those edges of G with one end vertex from {ui, . . . ,Mj} 
and one end vertex from {tti+i, . . . , 

Let us describe more precisely the graphs that the expressions define for 
i = 1, . . . ,n. Let Gi, 1 < i < n, he the subgraph of G induced by the vertices 
{mi, . . . , Ui}. Let G' be the connected component of Gi to which Ui belongs, and 
let Gi be the graph G' extended by all the edges (and their end vertices) of G 
that have exactly one end vertex in G(. A simple induction on i will show that 
Xi defines the line graph of Gi. 

Basis: Let i = 1. 

Graph Gi consists of 1 + |iV+(G,o, 1)| vertices and |A^+(G,o, 1)| edges between 
ui and the vertices from 7V+(G, o, 1). In this case, graph val(Ai) defines a com- 
plete graph with |iV+(G, o, 1)| vertices labeled by the colors of the vertices of 
N~^{G, o, 1). The graph defined by Xi obviously represents the line graph of Gi. 

Induction: Let i > 1. 

Let iV“(T(G, o), o, z) = . . ,Vj^} and A^+(G,o, z) = {u /^, . . . If m = 

0, then as in the case where i = 1, Gi consists of I -I- |fV+(G, o, z)| vertices 
and |iV+(G, o, z)| edges between m and the vertices from fV+(G, o, z). Here also, 
expression Xi defines a complete graph with |A^+(G, o, z)| vertices labeled by the 
colors of the vertices of N~^{G, o, i). 

If m > 0, then we first define an expression 1) for the union of all the graphs 
defined by the expressions Xj .^ , • ■ • , Xj^ in which equal labeled vertices from dif- 
ferent graphs are connected by edges. These equal labeled vertices from different 
graphs have to be connected by edges because they will represent edges with one 
end vertex from {ui, . . . , zzi_i} and the same end vertex from {ui , ... , zz„}. 

By the inductive hypothesis and the additionally inserted edges, expression 
Yi Xs Zi now defines a graph that represents the line graph of Gi. Relation S 
connects a vertex u of the graph defined by Yi and a vertex v of the graph defined 
by Zi if the following hold true. 

1 . Both vertices have the same label from [fc -|- 1] . 

Then u and v represent two adjacent edges {ui>,Uji} and {ui,uj>} of G, 
respectively, where i' < i < j' . 

2. The label of u is the color col(zzj) of zz* in G. 

Then u represents an edge {ui>,Ui} of G where i' < i. These edges are all 
adjacent with the edges represented by the vertices of the graph defined by 
Z^. 

The final relabeling o^. changes label col(zzi) into label k + 2, because the 
vertices labeled by col(zzi) now represent only edges e of G such that for all 
edges adjacent to e both end vertices are already contained in the graph defined 
by X,. 
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Since G„ is the graph G, we have defined an NLC-width k + 2-expression for 
the line graph of G. 

For a given graph G = {Vo,Eg), I vertex pairs (t6i,vi),... ,{ui,vi), and I 
integers ri, . . . , r; let G" = {Vg',Eg') be the graph G with 2-1 additional vertices 
u'lT . . . . . ,v'i and 2-1 additional edges {u'^^ui }, . . . , {mJ, tt/}, {v^, fi}, ■ • • , 

{v\,vi , }. Let H be the line graph of G' and Si and ti be the vertices of H that 
represents the edge {u[,Ui\ and {v'^,Vi}, 1 < i < I, of G' . Then there are 
mutually vertex disjoint paths in H between Si and ti for i = 1, . . . ,l if and 
only if there are mutually edge disjoint paths in G between Ui and Vi for 
i = 1, .. . , Z. If G is a partial 2-tree than G' is a partial 2-tree. Since the problem 
of finding edge disjoint paths is NP-complete for partial 2-tree, see [NVZOl], we 
have shown the following theorem. 

Theorem 4. The vertex disjoint paths problem is NP-complete for graphs of 
NLC-width at most 4. 

A simple modification of the proof of Theorem 3 shows that the line graph 
of every partial fc-tree has clique- width at most 2fc -|- 3. This implies that the 
vertex disjoint paths problem is NP-complete for graphs of clique-width at most 
7. Theorem 3 also implies that the chromatic index of a graph of bounded tree- 
width can be solved in polynomial time, because the chromatic number problem 
for NLC-width and clique-width bounded graphs can be solved in polynomial 
time [EGWOl]. This re-proves a result by Bodlaender [Bod90]. Note also that 
the proof of Theorem 3 is constructive, i.e., an NLC-width expression and clique- 
width expression can simply be constructed from a given partial fc-tree. 
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Abstract. In this note, we establish that any interval or circular-arc 
graph with n vertices admits a partition into O(logn) proper interval 
subgraphs. This bound is shown to be asymptotically sharp for an infinite 
family of interval graphs. Moreover, the constructive proof yields a linear- 
time and space algorithm to compute such a partition. The second part 
of the paper is devoted to an application of this result, which has actually 
inspired this research: the design of an efficient approximation algorithm 
for a VP-hard problem of planning working schedules. 



1 Introduction 

An undirected graph G=(V,E) is an interval graph if to each vertex v G V can be 
associated an open (resp. closed) interval /„ of the real line, such that any pair of 
distinct vertices u, v are connected by an edge of E if and only if V n/„ yf 0. The 
family {Iv}y^v interval representation of G; the left and right endpoints of 
ly are respectively denoted by le{Iy) and re{Iy). The edges of the complement 
graph G are transitively orientable by setting u ^ v if r^ < 1^; the orientation 
of the edges induces a partial order called interval order (we shall write A 
if r„ < ly). In the same way, the intersection graph of collections of arcs on a 
circle is called circular-arc graph. A circular-arc representation of an undirected 
graph G which fails to cover some point p on the circle will be topologically the 
same as an interval representation of G. In effect, we can cut the circle at p and 
straighten it out a line, the arcs becoming intervals. It is easy to notice therefore, 
that every interval graph is a circular-arc graph. 

An interval graph G is called proper interval graph if there is an interval 
representation of G such that no interval contains properly another. A nice result 
of Roberts (1969, cf. [13,6]) establishes that proper interval graphs coincide with 
unit interval graphs^ the interval graphs having an interval representation such 
that all intervals have the same size, and Ki^^-free interval graphs, the interval 
graphs without induced copy of a tree composed of one central vertex and three 
leaves. 

* The author is a Ph.D. student in Computer Science and Mathematics. 
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The main result. Interval and circular-arc graphs have been intensively stud- 
ied for several decades by both discrete mathematicians and theoretical computer 
scientists. These two classes of graphs are particulary known for providing nu- 
merous models in diverse areas like scheduling, genetics, psychology, sociology, 
archaeology and others. For surveys on all results and applications concerning 
interval and circular-arc graphs, the interested reader is referred to [13,6,8]. 

In this note, the problem of partitioning interval or circular-arc graphs into 
proper interval subgraphs is investigated. Two questions can be raised concerning 
this problem. The first, rather asked by the mathematician is: could you find good 
lower and upper bounds on the size of a minimum partition of an interval or 
circular-arc graph into proper interval subgraphs ? The second, rather asked by 
the computer scientist is: could you find an efficient algorithm to compute such 
a minimum partition ? An answer to the first question is given in this paper, 
through the following theorem. Although the result provides some advances on 
the second question (discussed in Conclusion), this one remains open at our 
knowledge. 

Theorem 1. Any interval graph or circular-arc graph with n vertices admits a 
partition into O(logn) proper interval subgraphs. Moreover, this bound is asymp- 
totically sharp for an infinite family of interval graphs. 

The constructive proof of the result (described Section 2) yields a linear-time 
and space algorithm to compute such a partition. Thereby, this result could find 
applications in the design of approximation algorithms for hard problems on 
interval or circular-arc graphs, since many untractable problems for these graphs 
become easier for proper interval graphs. In the second part of the paper, we 
present such a kind of application in the area of working schedules planning, 
which has actually inspired this research. 



Applications. The problem of planning working schedules holds an important 
place in operations research and business administration. In a schematic way, 
the problem consists in the assignment of fixed tasks to employees in the form of 
shifts. The tasks of the shift allocated to an employee, which induce his working 
schedules, must be pairwise disjoint (non-intersecting). Here a problem derived 
from schedules planning problems solved by the firm Prologia - Groupe Air 
Liquide [12] is considered. This fundamental problem, denoted WSP, is defined 
as follows. Let be a set of tasks having respective starting and ending 

dates (li,ri). The regulation imposes that any employee cannot execute more 
than k tasks. Given that the tasks allocated to an employee must not overlap, 
build an optimal planning according to the following objectives: on a first level, 
reduce the number of shifts or employees (productivity) and then on a second 
level, balance the planning (social) and prevent as well as possible the future 
modifications of the planning (robustness) . 

Since the tasks are simply some intervals of the real line, the WSP problem 
can be reformulated in graph-theoretic terms as the problem of coloring an 
interval graph such that each color marks at most k vertices. When the planning 
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is cyclic, we obtain the same coloring problem with circular-arc graphs. In this 
model, the optimization criteria become respectively: to minimize the number of 
colors (P), balance the number of vertices in each color class (S) and maximize 
the smallest gap existing between two consecutive intervals or arcs having the 
same color (P). In fact, the criterion R prevents overlappings when some intervals 
or arcs are delayed or put forward. Hence, a solution to WSP is called {P)-optimal 
(resp. {S, R)-optimal) if it is optimal according to criterion P (resp. criteria S 
and R). Then, a {P\S, R) -optimal solution is defined to be one which is {S,R)~ 
optimal among all (P)-optimal solutions. 

The complexity of WSP for interval graphs was recently investigated with 
the single optimization criterion P. Bodlaender and Jansen [2] have shown that 
this is a AfP-hard problem even for fixed k > 4; the problem for k = 3 remains 
an open question at our knowledge. For k = 2, this is solved in linear time and 
space by matching techniques [1,5]. Unless V = AfP, the inherent hardness of the 
problem condemns us to design efficient heuristics for finding “good” solutions. 
In this way, linear-time approximations are presented for the WSP in the second 
part of the paper (Section 3). A classical algorithm is briefly described which 
achieves a constant worst-case ratio for the single criterion P. Unfortunately, 
such an algorithm offers no guarantee on the satisfiability of criteria S and R. 
Surprisingly, the WSP problem for proper interval graphs is proved to be solvable 
in a {P\R, S')-optimal way by a greedy algorithm. Thus, an idea is to partition 
the input interval graph into proper interval subgraphs and solve optimally the 
problem on each subgraph using the greedy. Obviously, the quality of such a local 
optimization depends strongly on how the input interval graph is partitionned. 
Hence, the theorem previously cited enables us to design a new algorithm which 
achieves a logarithmic worst-case ratio for criterion P, but moreover guaran- 
tees that {P\R, S)- optima are reached in a logarithmic number of subproblems. 
Finally, we remark that in real-life situations, ie. under certain conditions, the 
logarithmic worst-case ratio becomes constant. 



Preliminaries. Before giving the first results, some useful notations and defi- 
nitions are detailed. All the graph-theoretic terms not defined here can be found 
in [13,6]. Let G = {V, E) be an undirected graph. For simplicity, n and m denote 
respectively the number of vertices and edges of G throughout the paper. A 
complete set or clique is a set of pairwise connected vertices. The clique number 
w(G) is the cardinality of the largest clique in G. On the opposite, an indepen- 
dent set or stable is a set of pairwise non-connected vertices. A eoloring of G 
associates to each vertex one color in such a way that two connected vertices 
have different colors. In fact, a coloring of G corresponds to a partition of G 
into stables. The chromatic number x(G) is the cardinality of a partition of G 
into the least number of stables. In the same way, x(G, k) is defined to be the 
size of a minimum partition of G into stables of size at most k. The quality of 
our approximation algorithms in relation to the criterion P is measured by their 
worst-case ratio defined as supq{|5|/x(G, k)} where S is any partition of G into 
stables of size at most k output by the algorithm. 
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2 The Proof of Theorem 1 

Although offering only a linear upper bound, the following lemma is crucial in 
the proof of the theorem. 

Lemma 1. Let G = (V,E) be a Ki^t-free interval graph with t > 3. Then G 
admits a partition into \t/2\ proper interval subgraphs. Moreover, this partition 
is computed in linear time and space. 

Proof. An algorithm is proposed for computing such a partition. Synthetically, 
the algorithm extracts and colors greedily some cliques of G with the set of col- 
ors , [t/2j }; the output is the partition of G induced by these \ t/2\ colors. 

Algorithm ColorCliques 

input: a Ai^t-free interval graph G = {V,E) with t > 3; 

output: a partition of G into [t/2j proper interval subgraphs; 

begin 

compute an interval representation Ji , . . . , In of G; 

order 7i , . . . according to the left endpoints; 
i ^ ^ j ^ 1; 

while i < n do 

Gj {li}, Left f— li, i i p 1', 
while i < n and Left fl A 7^ 0 do 
G, ^ Gj U {A}; 

if re{L) < re{Left) then Left ^ L\ 
i i -I- 1; 

c ^ (j - 1) mod [t/2j + 1, ^ U {Gf}, j^j + 1; 

return C^,... 

end; 

Since computing an ordered interval representation is done in 0{n + m) time 
and space [4,9], the algorithm runs in linear time and space. This correctness is 
established by showing that the color class induces a proper interval graph for 
any c G {1, ■ • ■ , }• Let = {Cf , . . . , G^} be the set of cliques assigned to 

by the algorithm (in the order of their extraction) . If g < 2 then is trivially 
Ali^3-free. Otherwise, suppose that contains an induced subgraph with 
its central vertex and Ib Ic < Id its three leaves. Clearly, the leaves belong to 
disjoint cliques: set h G C]), Ic G and Id G G!f with u < v < w £ {1, . . . ,q}. 
According to the algorithm, la belongs necessarily to C°. Now, from every clique 
Gj colored by the algorithm between and G!f, select the interval having the 
smallest right endpoint in Gj and add it to the set S initially empty. We claim 
that S induces a stable of size at least 2[t/2j -I- 1. If two intervals of S are 
intersecting, then they belong to the same colored clique, a contradiction. At 
least [t/2j cliques are colored by the algorithm from C]] to G^ exclusive and 
still at least [t/2j from Gf, to Gff exclusive. Thus, S contains at least 2[t/2j -|- 1 
elements, which proves the claim. Since la G and la C\ Id ^ la intersects 
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every interval in S except maybe the one most to right which belongs to C^. 
This last interval is replaced in S by the interval in effect, Id cannot intersect 
the last but one interval of S (otherwise Id ^ C^, a contradiction). Finally, 
since 2[t/2j -|- 1 > t for all t > 3, we obtain that at least t disjoint intervals 
are overlapped by la, which is in contradiction with the fact that G is iCi^t-free. 
Therefore, the color class induces well a Ki ^-free interval graph, ie. a proper 
interval graph by Roberts theorem (cf. [13,6]), and the whole correctness of the 
algorithm is established. □ 

Remark. In Algorithm ColorCliques, the assignment of colors is done according 
to the basic ordering {1, . . . , [t/2]}. The correctness holds by using any permu- 
tation of the set {1, . . . , [t/2j , 1, . . . , [t/2j }, repeated as many time as necessary 
to complete the assignment (the proof remains the same). Notably, this implies 
that there exists at least (2t)!/2*f! non-isomorphic partitions of a A'i_t-free in- 
terval graph into proper interval graphs. Note that determining the minimum 
value t for which G is A'l^t-free can be done in 0{n?) time by computing the 
largest stable [7] contained in each interval of its representation /i, . . . 

Lemma 2. Any interval graph G = {V,E) admits a partition into less than 
[log 3 ((n-|- l)/2)] Ki^^-free interval subgraphs. Moreover, this partition is com- 
puted in linear time and space. 

Before giving the proof of the lemma, we need to establish this useful claim. 

Claim. Any interval graph G = (V, E) admits an open (resp. closed) interval 
representation such that every interval has positive integer endpoints lower than 
n (resp. 2n). Moreover, this representation is computed in linear time and space. 

Proof. Let A = (atj) be the maximal cliques-versus-vertices incidence matrix 
of G. A (0, l)-matrix has the consecutive I’s property for columns if its rows 
can be permuted in such a way that the I’s in each column occur consecutively. 
A well-known characterization of interval graphs is that the matrix A has the 
consecutive I’s property for columns and no more than n rows (Fulkerson-Gross 
1965, cf. [6]). Thereby, consider a representation of A with the I’s consecutive 
in each column and for each v € V, set le{Iy) = min{i | = 1} and re{Iy) = 

max{i I = 1}. Clearly, the open interval representation {Iv}v^v is such that 
every endpoint is in {1, . . . ,n}. This interval representation is correct because 
two intervals are intersecting if and only if their two corresponding vertices are 
connected. Computing the matrix A with consecutive I’s is done in 0{n m) 
time and space [9]. Therefore, the complexity of the previous construction is 
linear. Finally, a closed interval representation is obtained from the previous 
open interval representation. Sort all the endpoints (left and right mixed) in the 
ascendant order. For i = 1, . . . ,2n, assign to the endpoint the value z and 
then redefine the n intervals as closed with their new endpoints in {1, . . . , 2n}. 
Since the order on the endpoints is unchanged, the interval graph remains the 
same. Moreover, sorting 2rz integers in {1,... ,n} is done in 0(n) time using 
0{n) space, which concludes the proof. □ 
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Proof (of Lemma 2). According to the Claim, compute in linear time and space 
an open interval representation /i,... ,In of G with endpoints in {1,... ,n} 
and denote by £ the maximum length of an interval (£ < n — 1). Then, par- 
tition the intervals according to their length into [log3((£ -I- 2)/2)] subsets as 
follows: Ii contains the intervals of length {1, 2, 3, 4}, I2 the intervals of length 
{5, 6, . . . , 16}, ... ,2i the intervals of length {2.3*“^ — 1, ... , 2.3* — 2}. We af- 
firm that each subset Ii induces a Ffi^s-free interval graph. Indeed, the contrary 
implies that one interval of Ii contains properly three disjoint intervals whose 
sum of lengths is lower than 2.3* — 4, which is a contradiction (the minimum 
sum of three intervals is 3(2.3*“^ — 1) = 2.3* — 3). Note that the proof re- 
mains correct by starting with a closed interval representation with endpoints in 
{!,... , 2n| and partitioning such that each set Ii contains the intervals of length 
{4.3*“^ — 3, . . . , 4.3* — 4} for z = 1, . . . , |"log3((£ -I- 4)/4)] (here £ < 2n — 1). □ 

Remark. In fact, we can prove more generally that any interval graph G = (V, E) 
admits a partition into 0(log( n) ATi^t+2-free interval subgraphs for any integer 
t > 3. 

Proposition 1. Any interval graph (resp. circular-arc graph) G = {V,E) ad- 
mits a partition into less than 2[log3((n-|- l)/2)] (resp. 2[log3((n-|- l)/2)] -\-l) 
proper interval subgraphs. Moreover, this partition is computed in linear time 
and space. 

Proof. The proof of the bound for interval graphs follows immediately the com- 
bination of Lemmas 2 and 1 (with t = 5). For circular-arc graphs, compute first 
a circular-arc representation of G in linear time and space [10]. Now, choose one 
point p on the circle and compute the set of vertices V* corresponding to the 
arcs which contain p. By observing that V* forms a clique and the subgraph 
induced by V* \ 14* is an interval graph, we obtain the desired bound for circular- 
arc graphs (any clique induces trivially a proper interval graph). □ 

The first half of Theorem 1 is established through the previous proposition, 
while the second is established via the next proposition. 

Proposition 2. For infinitely many r, the complete r-partite graph Hr = {Si U 
••• U Sr,E) with [All = 1,... ,|S'r| = 3’’“^ admits no partition into less than 
log3(2n-|- 1) proper interval subgraphs. 

Proof. An interval representation of the graph Hr is built by defining recursively 
the r stables S\,... ,Sr as follows. The stable consists of one open interval of 
length 3’’“^. For all z = 2, . . . , r, the stable Si is obtained by copying the stable 
Si-i and subdivising each interval of this one into three open intervals of equal 
length. The resulting stables Si, . . . , Sr induce well a complete r-partite graph. 
Note that the number of vertices of Hr is given by (*) n = X)i=i 3*”^ = (3’’ — 1)/2. 

Since any stable induces trivially a proper interval graph. Hr admits a par- 
tition into r proper interval subgraphs. Now, using induction, we show that 
any minimum partition of Hr into proper interval subgraphs has the cardinality 
p{Hr) = r. First, one can easily verify that p{Hi) = 1 or p(i?2) = 2; then, the 
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induction basis is i — 1 for i > 2. Now, suppose that p{Hi) < i and 

consider a partition of Hi into z — 1 sets Ii, ■ ■ ■ , li-i of proper intervals. Without 
loss of generality, the single interval I* G Si belongs to Ii . We claim that the 
intervals of Ii \ I* induce at most two disjoint cliques. In effect, the contrary 
implies the existence of an induced subgraph K13 in Ii (with I* as central ver- 
tex and one interval in each disjoint clique as leaves). According to this claim, 
at least one interval of S2 and all the intervals stemming from its subdivision 
in S3,. . . , Si do not belong to Ii. Clearly, such a set of intervals induces the 
graph Hi-i and by induction hypothesis, needs z — 1 sets to be partitionned into 
proper interval subgraphs. However, only the z — 2 sets I 2 , ■ ■ ■ ,Ii-i are available 
to realize that, which leads to a contradiction. This completes the induction by 
obtaining that p{Hi) = z for z > 2. The equality (*) is finally used to obtain 
p(-ffr) = log3(2rz-h 1). □ 

Corollary 1. For every t > 3, a Ki^t-free interval graph with at 

most [(3t — 4)/2j vertices exists which admits no partition into less than 
[log 3 (t — 1)J -I- 1 proper interval subgraphs. 

Proof. The graph Hr defined in Proposition 2 is clearly Aii j-free for t G {3’’“^ -I- 
1, . . . , 3'’}. By simple calculation, we deduce that Hr has at most [(3t — 4)/2j 
vertices and admits no partition into less than [log 3 (t — 1)J -I- 1 proper interval 
subgraphs for t G {3’'-! -f 1,... ,3’'}. □ 

3 Applications to Working Schednles Planning 

A classical approximation. In this subsection, a classical algorithm is pre- 
sented to approximate WSP with interval graphs. Here are two propositions, 
partially established in [5], which are behind its proof. 

Proposition 3. A minimum coloring of an interval graph G = {V, E) such that 
the number s(G) of stables consisting of only one vertex is as small as possible 
is computed in linear time and space. 

Proposition 4. Let G = {V, E) be an interval graph and k an integer. If G 
is colored such that each color is used at least k times, then G admits an op- 
timal partition into \n/k\ stables of size at most k. Moreover, this partition is 
computed in linear time and space given the coloring in input. 

Algorithm 2-ApproxWSP 

input: an interval graph G = {V, E), an integer fc; 
output: a solution S to the WSP problem for G; 

begin 

compute a minimum coloring C = {S'!, . . . , of G with s(G) minimum; 

5^0; 

for each Si G C do 

if IS'il < k then C revis'd, 5^ 5u{5'i}; 
compute an optimal partition Sk of C into stables of size at most fc; 
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S i — S u Sk ; 

return S; 
end; 

Theorem 2. Algorithm 2-ApproxWSP achieves in linear time and space the 
asymptotic worst-case ratio 2{k— \)/k for the criterion P. Moreover, this worst- 
case ratio is tight. 

Proof. Omitted here. □ 

Remark. A similar algorithm can be designed to approximate WSP for circular- 
arc graphs with worst-case ratio 3: first determine in linear time a coloring using 
less than 2 w(G) colors and then use Proposition 4, which remains correct for 
circular-arc graphs, to find a solution to WSP. 



A greedy for proper interval graphs. Here a greedy algorithm is presented 
which solves the WSP problem for proper interval graphs. 

Algorithm GreedyProperWSP 

input: a proper interval graph G = (V, E), an integer k; 

output: a solution S to the WSP problem for G; 

begin 

compute a proper interval representation 7i, . . . ,I„ of G; 

order 7i, . . . ,In according to the left endpoints; 

compute oj{G) and x{G,k) max{w(G), [n/fc]}; 

Si S„,(c,k) 0; 

for i from 1 to n do 

j ■(- {i- 1) mod x(G, k) 1, Sj Sj U {7i}; 

5 {Si, . . . , S,^(G,fc)}; 

return S; 

end; 

Computing an ordered proper interval representation of G is done in 0{n-\-m) 
time and space [3] and oj{G) is computed in 0(n) time [7]. Consequently, the 
algorithm runs in linear time and space. 

Lemma 3. The output solution S is {P\S)-optimal. 

Proof. First, we claim that the output stables Si, . . . , 5'^(G,fc) have a size at most 
k. According to the algorithm, the stables have the same size (to within one unity 
if n is not a multiple of k) . Then, the existence of one stable of size strictly larger 
than k implies that n > k\{G, k), a contradiction. Additionally, this establishes 
the (S')-optimality of S. Now, suppose that two intervals 7„,7^ with u < v are 
intersecting in the stable Sj for any j G (1, . . . , x(G, fc)}. By the algorithm, we 
have u = j -\- ax(G, k) and v = j -\- Px(G, k) with a < j3. When the intervals 
are proper, the right endpoints have the same order as the left endpoints. Then, 
the intervals 7„, 7„+i, . . . , 7„_i, 7„ include the portion [lv,ru] of the real line and 
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also induce a clique of size v — u+l = {j3 — a)x{G, fc) -I- 1 > x{G, fc) + 1- Such a 
clique implies that w(G) > x{G, k), which is a contradiction and the correctness 
of the solution S is entirely proved. To conclude, S is (P|S')-optimal because 
max{w(G), \n/k'\} is a lower bound for x(G, k). □ 



Lemma 4. The output solution S is {P\R)~ optimal. 

Proof (Sketch). The (P)-optimality of S is established by Lemma 2. Now, sup- 
pose that the set . . . , S^(^c,k) is not (P|i?)-optimal. Define S'*, . . . , S*^g to 
be a (P|i?)-optimal solution and g* the minimum gap between two consecutive 
intervals of this solution. Remind that the intervals I\, . . . are ordered accor- 
ding to the left endpoints and ly^t denotes the interval of rank t in the stable S*. 
We claim that for alH = 1, . . . , n, the interval li G S* can be moved at the rank 
t = [(i — l)/x(G, k)\ -I- 1 of the stable set S* with u = (i — 1 ) mod x{G, k) + 1, 
without decreasing g* . After such an operation, the resulting set S*, . . . , S*^q 
coincide exactly with the solution Si, . . . , S^(G,fc) of the greedy, which establishes 
its (P|i?)-optimality. The claim is proved by an inductive process whose initial 
step is done as follows. If I\ G S* with u 1, exchange the entire set of intervals 
of S* with the one of S(. Clearly, g* is not deteriored (no gap is modified) and Ii 
is correctly placed. Now, the inductive step is proved; the intervals /i, . . . ,Ii-i 
are considered to be correctly placed. The interval A G S* shall be moved to the 
stable S* iiu^ v. Then, two cases are distinguished. 

Case u < v : 

S .^1 — {I up •}••••} I up , . 1 * j • ■ • 1 ^u,j : ■ ■ ■ } and Sy — 5 • • ■ , 1 5 5 

... ,/„j,...}. By induction hypothesis, we get re(Iy^t-i) < re{Iu,t) and 
le-{Ii) < le-{Iv,t)- Since re{Iu,t) < le(Ii), we obtain the inequalities (i) 
re(Iv^t-i) < re{Iu,t) < le{Ii) < le(Iy^t) which allow us to redefine S* = 
7 • ■ • ; kupi . . . , kyp , ■ ■ ■ } and Sy \ky i , . . . , 1 , A , . . . , ky j ,...}. 

Two gaps are changed: le{ki) — re{ky^t) in S'* becomes le(ky^t) — re{kyp) and 
le(ky^t) — re(ky^t-i) in S* becomes le{ki) — re(/„y_i). According to (i), the new 
gaps are larger than the minimum of the two old ones. 

Case u > v : 

Sy \kup , . ■ . , kup — \ , dj , . . . , kyp , ■ • • } and Sy 1 , . . . , ky^f—\ , ky l.f 

... ,/„j,...}. Here induction hypothesis provide the inequalities (ii) 
re{ky^t-i) < re(kup-i) < le{ki) < le{ky^t) and we redefine S* = 

\kyp , . . . , kyp— 1 , ky,t, ■ . ■ ,kyj, ...} and S* = {ky^l,. . . ,ky^t-l,ki, . . . ,kyj, ...}. 

According to (ii), the two new gaps in S* and S* are still larger than the 
minimum of the two old ones. 

The analysis of these two cases shows the correctness of the inductive step 
and completes the proof of the claim. □ 



Theorem 3. Algorithm Greedy ProperWSP determines in linear time and space 
(P|S, R) -optimal solutions to the problem WSP for proper interval graphs. 
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The logarithmic approximation with sub-optima. According to the 
previous discussions, a new approximation algorithm is designed for WSP with 
interval graphs. 

Algorithm log-ApproxWSP 
input: an interval graph G = {V, E), an integer fc; 
output: a solution S to the WSP problem for G; 
begin 
Si — 0 ; 

if G is a proper interval graph then S i— GreedyProperWSP(G, fc); 
else 

partition G into B{n) proper interval subgraphs Gi, . . . , GB{n)\ 
for each subgraph Gi do 5 5 U GreedyProperWSP(G, fc); 

return 5; 
end; 



Theorem 4. Algorithm log-ApproxWSP achieves in linear time and space the 
absolute worst-case ratio min{A:, i?(n)} with B{n) = 2[log3((n+ l)/2)] for the 
criterion P and guarantees that {P\S,R)-optima are reached in B{n) subpro- 
blems. Moreover, the worst-case ratio is asymptotically tight. 

Proof. Correctness and complexity follow from Theorems 1 and 3, plus the fact 
that recognizing a proper interval graph is done in linear time and space [3]. 
To complete the proof, the worst-case ratio is established. If G is a proper in- 
terval graph then S is optimal. Otherwise, we have |5| = x{Gi,k). By 

using the inequalities x{Gi,k) < n < k ■ x{G,k) and x{Gi,k) < 

^ x{G, k) < B{n) ■ x(G, k), we obtain the result. 

Finally, an interval graph G is given which tights asymptotically the ratio 
min{fc, i?(n)} with B{n) = 2[log3((n -b 1)/2)J and k = B{n). The complete 
proof is not detailed here; without loss of generality, we assume that n is a 
multiple of B{n) and set N{n) = n/B{n) — 1. The interval graph is modeled 
by the following set of open intervals. For i = 1, . . . , B{n)l2, take one interval 
(1, 2.3* — 1), one interval (1, 2.3*“^), N{n) intervals (2.3*“^, 4.3*“^ — 1) and N{n) 
intervals (4.3*“^ — 1,2.3* — 2) (see Fig. 1 above for an example of construction). 
Note that the endpoints are well in {!,... ,n} and G is not a proper interval 
graph. In this case, one can verify that the approximation ratio of Algorithm 
log-ApproxCIGfc is 

l>g| ^ {B{n)/2){2N{n) + l) ^ n-B{n)l2 ^ ^ . 

x{G,k) 2{B{n)/2) + N{n)-l ^ ^ n + B^{n) - 2B{n) ^ ^ 

□ 



Remark. Algorithm log-ApproxCIGfc produces (PjA, i?)-optimal solutions when 
G is a proper interval graph. Besides, in real-life situations [12], the minimum 
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Fig. 1. An example of construction which tights the worst-case ratio with n = 24 
(fc = B(24) = 4, N{2A) = 5): x(G, fc) = 8 and |5| = 22. 



value t for which G is iCi.t-free is generally small (< 9). This allows direct 
partitionings into proper interval subgraphs by Algorithm ColorCliques and also 
the obtaining of constant worst-case ratios (< 4) for the criterion P. For example, 
for tasks of 1, 2, 3 or 4 hours, we can obtain a 2-approximation and for tasks of 
1,2, .. . ,8 hours, a 3-approximation. Moreover, Algorithm log-ApproxWSP can 
be easily adapted for circular-arc graphs. In this case, its “real-life” worst-case 
ratio is nearly the same than the one obtained by the classical approach. 

4 Conclusion 

As a conclusion, we discuss some projections on the complexity of determining a 
minimum partition of a interval graph into proper interval subgraphs. In effect, 
answering to the mathematician has provided some hints for answering to the 
computer scientist. 

First, we know now that a minimum partition of a ATi^s-free interval graph G 
into proper interval subgraphs is computed in linear time and space: if G is not 
a proper interval graph, then we can use Lemma 1 to partition G into 2 proper 
interval graphs (recognizing proper interval graph is done in linear time and 
space [3]). For Kifi-iree interval graphs (and also for arbitrary interval graphs), 
we conjecture that the problem is A/”P-complete. 

Finding a polynomial-time approximation algorithm with constant worst- 
case ratio for the problem seems to be difficult too. However, combining the 
previous remark with Lemma 2 enables us to design a linear-time approxima- 
tion algorithm, similar to Algorithm log-ApproxWSP, which achieves the worst- 
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case ratio Inn for this problem: if G is not a proper interval graph, then we 
can partition it into [log3((n + l)/2)] Ffi 5-free proper interval graphs (each of 
then are partitionned in linear time into a minimum number of proper interval 
subgraphs). 
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Abstract. An n-node tree has to be explored by k mobile agents (robots), starting 
in its root. Every edge of the tree must be traversed by at least one robot, and 
exploration must be completed as fast as possible. Even when the tree is known 
in advance, scheduling optimal collective exploration turns out to be NP-hard. We 
investigate the problem of distributed collective exploration of unknown trees. Not 
surprisingly, communication between robots influences the time of exploration. 
Our main communication scenario is the following: robots can communicate by 
writing at the currently visited node previously acquired information, and reading 
information available at this node. We construct an exploration algorithm whose 
running time for any tree is only 0{k/ log k) larger than optimal exploration 
time with full knowledge of the tree. (We say that the algorithm has overhead 
0(k/ log k)). On the other hand we show that, in order to get overhead sublinear 
in the number of robots, some communication is necessary. Indeed, we prove that 
if robots cannot communicate at all, then every distributed exploration algorithm 
works in time l?(fc) larger than optimal exploration time with full knowledge, for 
some trees. 



1 Introduction 

A collection of robots (mobile agents), initially located at one node of an undirected 
connected graph, have to explore this graph and return to the starting point. The graph is 
explored if every edge is traversed by at least one robot. Every robot traverses any edge in 
unit time, and the time of collective exploration is the maximum time used by any robot 
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from the group. It turns out that scheduling optimal collective exploration is NP-hard, 
even in the simplest case, when the explored graph is a tree and when it is known in 
advance. However, most often, exploration problems are studied in the case of unknown 
graphs (cf. [1,6,12,14,15,16,17,21]). This is also the approach adopted in the present 
paper. We restrict attention to trees and, unlike in the above quoted papers, we consider 
exploration by many robots. The goal is to collectively explore the tree in the shortest 
possible time. Since the explored tree is not known in advance, a collective exploration 
algorithm can have different performance in different trees. In order to measure the 
quality of such an algorithm, we compare its performance to the performance of the 
optimal exploration algorithm which knows the tree in advance (recall that designing 
such an optimal exploration is NP-hard). A collective exploration algorithm A for k 
robots (working in unknown trees) is said to have overhead Q, if Q is the supremum 
of ratios A{k, T, r) /opt{k^ T, r), where A{k, T, r) is the exploration time of tree T by 
algorithm A, when robots start at node r, and opt{k, T, r) is the optimal exploration time 
of T by fc robots starting at r, assuming that T and r are known. The supremum is taken 
over all trees T and starting nodes r. Hence overhead is a measure of performance similar 
to competitive ratio for on-line algorithms. We seek collective exploration algorithms 
with low overhead. If the explored tree was known in advance, any exploration algorithm 
could be viewed as centralized, since it could assume knowledge of global history by 
any robot at any step. However, in our case, when the topology of the tree is unknown, 
distributed control of robots implies that their knowledge at any step of the exploration 
depends on communication between them. Below we specify communication scenarios. 

1.1 The Model 

We consider k robots initially located at the root r of an unknown tree T. Robots have 
distinct identifiers. Apart from that, they are identical. Each robot knows its own identifier 
and follows the same exploration algorithm which has the identifier as a parameter. The 
network is anonymous, i.e., nodes are not labeled, and ports at each node have only local 
labels which are distinct integers between 1 and the degree of the node. The robots move 
as follows. At every exploration step, every robot either traverses an edge incident to its 
current position, or remains in the current position. A robot traversing an edge knows 
local port numbers at both ends of the edge. 

Our main communication scenario, called exploration with write-read communica- 
tion, is the following. In every step of the algorithm every robot performs the following 
three actions: it moves to an adjacent node, writes some information in it, and then reads 
all information available at this node, including its degree. Alternatively, a robot can re- 
main in the current node, in which case it skips the writing action. Actions are assumed 
to be synchronous: if A is the set of robots that enter n in a given step, then first all robots 
from A enter v, then all robots from A write and then all robots currently located at v 
(those from A and those that have not moved from v in the current step) read. 

We also consider two extreme communication scenarios. In one, called exploration 
without communication, all robots are oblivious of each other. I.e., at each step, every 
robot knows only the route it traversed until this point (which is the sequence of exit and 
entry port numbers), and degrees of all nodes it visited. In the other, called exploration 
with complete communication, all robots can instantly communicate at each step. 
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In all scenarios, a robot, currently located at a node, does not know the other endpoints 
of yet unexplored incident edges. If the robot decides to traverse such a new edge, the 
choice of the actual edge belongs to the adversary, as we are interested in the worst-case 
performance. 

1.2 Our Results 

As a preliminary result, we show that the problem of finding optimal collective ex- 
ploration, if the tree and the starting node are known in advance, is NP-hard. Our main 
result concerns collective distributed exploration of unknown trees by k robots, under the 
write-read communication scenario. We construct an exploration algorithm with over- 
head 0{k/ log k). Indeed, our algorithm explores any n-node tree of diameter D in time 
0{D + n/ log k). We first describe our algorithm for the stronger scenario, exploration 
with complete communication, and then we show how to simulate this algorithm in the 
write-read model, without changing time complexity. We also prove that any algorithm 
must have overhead at least 2 — 1/k under the complete communication scenario. (This 
lower hound obviously carries over to the write-read communication scenario.) On the 
other hand we show that, in order to get overhead suhlinear in the number of robots, 
some communication is necessary. Indeed, we prove that, under the scenario without 
communication, every distributed collective exploration algorithm must have overhead 
f2(fc). Since this is the overhead of an algorithm using only one out of k robots, our 
lower bound shows that exploration without communication does not allow any effec- 
tive splitting of the task among robots. Comparing the upper hound on time for the 
scenario with write-read communication with the lower hound for the scenario without 
communication, shows that this difference of communication capability influences the 
order of magnitude of time of collective exploration. Even limited communication per- 
mitted by our write-read model allows robots to effectively collaborate in executing the 
exploration task. 

1.3 Related Work 

Exploration and navigation problems for robots in an unknown environment have been 
thoroughly investigated in recent literature (cf. the survey [23]). There are two types of 
models for these problems. In one of them a particular geometric setting is assumed, e.g., 
unknown terrain with convex obstacles [1 1], or room with polygonal [13] or rectangular 
[7] obstacles. Another approach is to model the environment as a graph, assuming that 
the robot may only move along its edges. The graph setting can be further specified in 
two different ways. In [1,8,9,14] the robot explores strongly connected directed graphs 
and it can move only in the direction from head to tail of an edge, not vice-versa. In [6, 
12,15,16,17,21] the explored graph is undirected and the robot can traverse edges in both 
directions. In some papers, additional restrictions on the moves of the robot are imposed. 
It is assumed that the robot has either a restricted tank [6,12], forcing it to periodically 
return to the base for refueling, or that it is tethered, i.e., attached to the base by a rope 
or cable of restricted length [17]. It is proved in [17] that exploration can be done in time 
0{e) under both scenarios, where e is the number of edges in the graph. 

Exploration of anonymous graphs presents a different type of challenges. In this case 
it is impossible to explore arbitrary graphs if no marking of nodes is allowed. Hence the 




144 



P. Fraigniaud et al. 



scenario adopted in [8,9] was to allow pebbles which the robot can drop on nodes to 
recognize already visited ones, and then remove them and drop in other places. In [9] the 
authors compared exploration power of one robot to that of two cooperating robots with 
a constant number of pebbles. In [8] it was shown that one pebble is enough if the robot 
knows an upper bound on the size of the graph, and 6>(log log n) pebbles are necessary 
and sufficient otherwise. 

In all the above papers, except [9], exploration was performed by a single robot. 
Exploration by many robots was investigated mostly in the context of graphs known 
in advance. In [18], approximation algorithms were given for the collective exploration 
problem in arbitrary graphs. In [4,5] the authors constructed approximation algorithms 
for the collective exploration problem in weighted trees. It was also observed in [4] 
that scheduling optimal collective exploration in weighted trees is NP-hard even for 
two robots. However, the argument from [4] does not work if all weights of edges are 
equal to 1, which we assume. It should also be noted that, while in [4,5] exploration 
was centralized, the main focus of this paper is a distributed approach to collective tree 
exploration. 

Another interesting study of collective exploration in unknown environments can be 
found, e.g., in [24,20], in the context of a search problem in geometric trees and simple 
polygons. Finally, collective exploration is also related to \he freeze-tag problem [2,3] 
in which a set of “asleep” robots must be awaken, starting with only one “awake” robot. 

2 Exploration with Complete Communication 

It is possible to prove that the problem of scheduling optimal collective exploration, 
if the tree and the starting node are known in advance (i.e., the problem of finding an 
exploration scheme working in time opt{k, T, r)), is NP-hard. However, due to the space 
constraints the proof of this fact is omitted here. 

In this section we describe and analyze an exploration algorithm for k robots, with 
overhead 0{k/ log k), under a communication model stronger than write-read commu- 
nication, namely exploration with complete communication. At every step of exploration 
all robots exchange messages containing all information acquired so far. 

We will use the following terminology. We denote by the subtree of the explored 
tree T, rooted at node u. T„ is explored, if every edge of T„ has been traversed by some 
robot. Otherwise, it is called unexplored. T„ is finished, if it is explored and either there 
are no robots in it, or all robots in it are in u. Otherwise, it is called unfinished. T„ is 
inhabited, if there is at least one robot in it. 

Algorithm Collective Exploration 

Fix a step i of the algorithm and a node v in which some robots are currently located. 
There are three possible (exclusive) cases. 

Case 1. Subtree T„ is finished. 

Action : if u ^ r, all robots from v go to the parent of v, else all robots from v stop. 
Case 2. There exists a child u of v such that T„ is unfinished. 

Fet u\,...,Uj be children of v for which the corresponding trees are unfinished, ordered 
in increasing order of corresponding local port numbers at v. Fet xi be the number of 
robots currently located in T„, . Partition all robots from v into sets A\,...,Aj of sizes 
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yi,...,yj, respectively, so that integers xi + yi differ by at most 1. The partition is done 
in such a way that indices I for which integers xi + yi are larger by one than for some 
others, form an initial segment [1, z] in 1, j. (We will show in the proof of Lemma 
1 that such a partition can be constructed). Moreover, sets Ai are formed one-by-one, 
by inserting robots from v in order of increasing identifiers. (Thus, the partition into 
sets Ai,...,Aj can be done distributedly by robots from v, using knowledge that they 
currently have). 

Action: all robots from set A[ go to ui, for I = 1, ..., j. 

Case 3. For all children u ofv, trees T„ are finished, but at least one is inhabited. 
Action : all robots from v remain in v. 

The following lemmas will be used in the analysis of this algorithm. 

Lemma 1. Let v be any node of tree T and let i be a fixed step of Algorithm Collective 
Exploration. Then numbers of robots in unfinished subtrees T„, for all children u ofv, 
differ by at most 1. 

Lemma 2. Let Ty be a subtree of tree T, and let i be the first step in which a robot 
enters v in the execution of Algorithm Collective Exploration. IfTy has m edges then 
Ty is finished by step i + 2m. 

Lemma 3. Algorithm Collective Exploration works in time 0{D + n/logfc) for all 
n-node trees of diameter D. 

Proof. Consider Algorithm Collective Exploration, working on a tree T of diameter D, 
rooted at r. Define a path S = (ao,ai, ...) in T as follows, oq = r. Suppose that Oj 
is already defined. Among all children of aj consider those nodes v for which Ty was 
finished last (there can be several such children). Define aj+i to be such a child with 
smallest port label. The length [S'! of S' is at most D. Intuitively, the path S leads to one 
of the leaves explored very late. 

For any positive integer i and for any j = 0, . . . , log k, denote by pfij) the largest 
index of a node v on path S such that there are at least 2-1 robots in Ty after step i. 
We will say that pfij) corresponds to the node with this index. Define nodes wfil), for 
|S| > ( > 1, as follows. Let wfil) denote the (th node on S which has at least two 
children ui and U 2 , such that and are inhabited after step i. Let dfil), for I > 1, 
denote the number of such children of node wfil). 

Define iq to be the last step of the algorithm satisfying the following condition: for 
all i < io, Pi(0) is smaller than the length of S. We first consider only steps of the 
algorithm until step ig. We define two types of such steps. A step z < ip of the algorithm 
is of type 

A. > ilogfc; 

B. if|{j :p^+i(j) ^Pi{j)}\ > |(logfc+l). 

We now show that all steps of the algorithm are of one of the above types. The proof 
of this fact is split into the following three claims. 

Claim 1. Eix a step i < ig of the algorithm, and consider a node Wi{l),for some I > 1. 
Then |{j : pfij) corresponds to node zui(()}| < dfil) + 1. 
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Let V denote the successor of Wi{l) on path S iv exists hy definition of io). Let jo 
be the smallest element in the set {j : Pi{j) corresponds to node Wi(l)}. The number of 
robots in T„, after step i, is a; < 2^°. By the definition of di{l) and by Lemma 1, the 
number of robots in is less than (a: + 1) • di{l). We have 

(a; + 1) • di{l) < X ■ di{l) + di{l) 

< X ■ 

< a; - +X • 

= X ■ 

Hence, if Pi{j) corresponds to Wi{l) then j < Jq + di{l) + 1. This proves Claim 1. 
Claim 2. Fix a step i < io of the algorithm. If Pi{j) does not correspond to any Wi{l), 
fori > L thenpi+i{j) ^ pfj). 

Consider Pi (j) satisfying the assumption of Claim 2. Let v denote the corresponding 
node on path S, and let v' denote the successor of v on S. The number of robots in T„ is 
equal to the number of robots in v plus the number of robots in T„s in view of the fact 
that Pi (j) does not correspond to any Wi{l) and of the definition of Wi{l). In step i + 1, 
all robots from v move to v', and all robots located in remain in Tyi. (Indeed, since 
* < io, Tv' has not yet been explored, hence it has not been finished, and all subtrees 
rooted at siblings of v' are finished and not inhabited, by the assumption that pi(j) does 
not correspond to any Wi{l).) Hence pi+i(jj corresponds to v'. This proves Claim 2. 

Claim 3. All steps i < io of the algorithm are either of type A or of type B. 

Fix a step i < zq of the algorithm and suppose that it is not of type A. Hence 

di{l) < I log k. Since di{l) > 2, for all I > 1, the number of indices I for which 
di{l) are defined, is less than | log k. It follows that + 1) < | log fc+ | log k = 

I log k. By Claim 1, the number of integers j, such that Pi{j) does not correspond to 
any Wi{l), for / > 1, is larger than log fc + 1 — | log k > |;(log k + 1). By Claim 2, the 
number of integers j, such thatpi_|_i(jj ^ Pi{j), is also larger than |(log k + 1). Hence 
step i is of type B. This proves Claim 3. 

We now estimate the number of steps of type A. Consider all subtrees T„ rooted at 
nodes u outside of S. Let Xu denote the number of edges of We have (x„+ 1) < n. 

Let ty denote the number of steps during which T„ is inhabited. By Lemma 2, < 

2n. In every step i of type A, at least ~ 1) trees are inhabited (subtrees T„ 

are rooted at nodes u outside of S, hence summands are di{l) — 1). Since di{l) > 2, we 
have '^i(di{l) — 1) > (X); di{l))/2 > | log k. Hence the number of steps of type A is 
at most xfX = 

i log k log k 

Next, we estimate the number of steps of type B. We have 

log k 

\{j ■ p*+i(i) 7^ p^U)}\ = I U U "t = 

i<io ^<*0 

log k log k 

I U -p^+iU) 7^PiU)}\ = Y I't* -p^+iO') 7^p^U)}\ < (logfc + i)- IS*!, 

i=0 i<io j=0 
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the last inequality following from the fact that before step zq all moves of robots on S are 
down the path S, and hence, for a given j, the size of the set {(*, j) : Pi+i{j) ^ Pi{j)} is 
bounded by the length of S. For every step i of type B, we have | {j : Pi+\ (j) ^ Pi{j)}\ > 
j(log fc + 1), hence the number of steps of type B is at most = 4 [S'!. Hence, 

by Claim 3, we have zq < + 4|S'|. 

We finally show that the algorithm completes exploration by step zq + 1 + |S'|. Let 
*1 = *0 + 1- Let X be the set of robots that are in the last node 6 of S' after step zi. In 
step ii + 1, all robots from X go to the parent of b, because 6 is a leaf. By definition of 
S, when a set of robots containing X moves from a node v' on S to its parent v, then 
Tyi is finished and not inhabited, and consequently, by the construction of v', T„ is also 
finished. It follows that in the next step, all robots from v move to the parent of v. Hence 
the number of steps after zi, needed to terminate the algorithm, is |S|. This implies that 
the algorithm terminates by step zi + |S'| = zo + l+ |S'|. Hence the running time of the 
algorithm is at most + SIS'] + 1 G 0{D + rz/log fc). 



Theorem 1. Algorithm Collective Exploration has overhead 0(fc/ log fc). 

Proof. Consider any zz-node tree T rooted at node r. If the diameter of T is at most 
then the theorem follows from Lemma 3, because opt{k,T,r) > 2(rz — l)/fc. 
If the diameter of T is larger than then opt{k,T,r) G because at 

least one robot has to visit the leaf farthest from r. By Lemma 2, Algorithm Collective 
Exploration uses time < 2rz, hence the overhead is 0(fc/ log fc) in this case as well. 

We conclude this section by stating a lower bound on the overhead of any collective 
exploration under the complete communication scenario. Clearly, this lower bound also 
holds under the write-read communication scenario. The proof is omitted. 

Theorem 2. Any collective exploration algorithm for k robots has overhead > 2 — l!k. 



3 Exploration with Write-Read Communication 

In this section we show how Algorithm Collective Exploration can be simulated in our 
write-read model, without changing time complexity. Fix any node v of the tree. Let 
i denote the step number, and let p denote the port number at v corresponding to the 
parent of v; in the case z; = r, we define p = *. We define the following sets: 

- Vi is the set of ports at v corresponding to children which are roots of unfinished 
subtrees, 

- Vi C Vi is the set of ports at v corresponding to children in whose subtrees there 
is one robot more than in subtrees of all other children. In the special case when all 
subtrees of children are inhabited by q robots, we define V^ = Vi, if g > 0, and 
V' = 9, ifq = 0. 

- TZi is the set of identifiers of robots that are in v after step z — 1. 
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Let JCi = {p,'Pi,'P^,TZi}, if node v has been visited by step i — 1 of Algorithm 
Collective Exploration. Otherwise ICi is undefined. We refer to ICi as the knowledge at 
node V after step i — 1 of Algorithm Collective Exploration. The action performed by 
every robot located at v after step i — 1 depends only on ICi and on the identifier of the 
robot. Hence Algorithm Collective Exploration dehnes the following action function H. 
Eor any step i and any robot R located at v after step z — 1, the value of H{ICi, R) is one 
of the following: 

- the port number a by which R leaves v in step i, 

- 0, if i? remains at v in step i, 

- *,if R stops. 

We construct a simulation of Algorithm Collective Exploration in the write-read 
communication model. The new algorithm is called Algorithm Write-Read. It operates 
in rounds logically corresponding to steps of Algorithm Collective Exploration. Each 
round z > 0 consists of three steps, 3z, 3z -F 1, 3z -F 2, and round 0 consists of two steps, 
1 and 2. Each step is in turn divided into three stages: in Stage 1 robots move, in Stage 2 
they write information in their location, and in Stage 3 they read information previously 
written in their location. 

Recall that, in the write-read model, any robot R entering node v can write some 
information in this node. In the Algorithm Write-Read, a robot R entering node v in step 
z using port a, writes the triplet (z, R, a) at node v. Denote by Xi the set consisting of the 
degree of v and of all triplets written at node v until step z — 1 of Algorithm Write-Read. 

We now dehne the knowledge ICi at v after round z — 1 of Algorithm Write-Read. 
If no triplets are written at node v then ICi is not defined. Otherwise, we dehne )Ci = 
{p^Vi,V'i,TZi}, where Pi, Pf Pi are dehned with respect to Algorithm Write-Read 
(after round z — 1) in the same way as Vi, Vf TZi were dehned with respect to Algorithm 
Collective Exploration (after step z — 1). We will show that, at the beginning of each 
round z of Algorithm Write-Read, any robot located at v knows Pi. Moreover, we will 
show that, for any v and any z. Pi = ICi, and that Pi is dehned for exactly the same 
nodes as K-i . 

Knowledge Pi is obtained from input X^i by the following recursive procedure. 

Procedure Knowledge Construction 

Assume that knowledge Pi is undehned at nodes other than r and that it equals to 
{*, Pi,P'i,Pi} at the root r, where P\ is the set of all ports of r, P[ = 0, and P\ is 
the set of all robots. Suppose that we can compute Pi from input X^i, at all nodes v. We 
show how to compute Pi+i from Isi+a, at node v. 

( 1 ) If there are no triplets written at node v for steps smaller than 3z (i.e., Pi is undehned) 
but there is some triplet (3z, R, a) € Taz+a then we put: p = a (there is exactly one 
such a in this case), Pi+i is the set of all ports at v other than a, P'ij^i = 0, 7^z+i is the 
set of all robots R, such that a triplet (3z, R, a) G Xaz+a is written in v. 

(2) Otherwise, we hrst put Pi+i = Pi, and then modify Pi+i , s.t.: p remains unchanged, 
Pi+i is the set of all ports from Pi, except those ports a, for which there is a triplet 
(3z -F 1, R, a) G 2az+3 at f (we discard those ports by which a robot entered conhrming 
that the corresponding subtree is hnished), ’P'+a contains z initial ports from Pi+i, where 
z is the integer defined in step z of Algorithm Collective Exploration, Pi+i := PiUX\Y, 
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where X is the set of robots R for which (3z, R, a) G Rsi+s, and Y is the set of robots 
R' for which H{JCi, R') = a ^ Q (we add robots that entered v in step 3i and delete 
those that left v in this step). 

Algorithm Write-Read 

Round 0 - This is a special round used to distinguish the root r. 

- - Step i : - - 
stage 1: do nothing. 

stage 2: every robot R writes (1, i?, *) at node r. 
stage 3: every robot R reads Xi at node r. 

- - Step 2 : - - (do nothing). 

Round z > 0 - Execution of each round is based on two assumptions 
The assumptions after round z — 1 are: assumption Ai - ICi is correctly computed by 
Procedure Knowledge Construction, using and assumption Bi - Xi = ICi, at any 
node, and Xi is defined for exactly the same nodes as Xi. 

The three steps of each round z have the following purpose. Step 3z is used to make 
the actual move of a robot to its new location, according to the simulated Algorithm 
Collective Exploration. Step 3z + 1 is used to temporarily move robots from a node 
whose subtree is finished, to its parent w, in order to update information held at w, 
concerning children with finished subtrees. Step 3z + 2 is used to move back robots that 
temporarily moved in Step 3z + 1. 

- - Step 3z: - - 

stage 1: If i? is at node r at the end of round z — 1, and H{Xi,R) = * for node r, then 
R stops. If R is at node v at the end of round z — 1, and H{Xi, R) = a ^ {0, *}, for 
node V, then R leaves v through port a. 

stage 2: Every robot R that entered v through port a in Stage 1 of Step 3z, writes 
(3z, R, a) in node v. 

stage 3: Every robot R located at v reads Isi+i (this is information held at v after Stage 
2 of Step 3z.) 

- - Step 3z + 1: - - 

stage 1: If Pi = 0 then every robot R located at v at the end of Step 3z leaves v through 
portp. 

stage 2: Every robot R that entered v through port a in Stage 1 of Step 3z + 1, writes 

(3z + 1, R, a) at node v. 

stage 3: Every robot R located at v reads l 3 i+ 2 . 

- - Step 3z + 2: - - 

stage 1: Every robot R that entered v through port a in Step 3z + 1, leaves v through 
port a. 

stage 2: Every robot R that entered v through port a in Stage 1 of Step 3z + 2, writes 

(3z + 2, R, a) in node v. 

stage 3: Every robot R located at v reads Isi+s. 

Remark. The return moves of robots in stage 1 of step 3z + 2 could be avoided. They are 
introduced to simplify analysis of knowledge update, and do not influence exploration 
complexity. 

Lemma 4. Assumptions Ai & Bi from Algorithm Write-Read are satisfied for alii > 0. 
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Theorem 3. Algorithm Write-Read works in time 0{D + for all n-node trees of 
diameter D. 

Proof By Lemma 3, it is enough to show that, for every tree T rooted at r, the number 
of rounds used hy Algorithm Write-Read is not larger than the number of steps used 
by Algorithm Collective Exploration. Let Iq denote the latter number. By Lemma 4, 
assumptions Ai^ and Big are satished. By assumption Big, all robots are at the root 
r after round io — 1, because they are all at the root after step io — 1 of Algorithm 
Collective Exploration. In Step 3io of Algorithm Write-Read, every robot R performs 
action H{ICig,R), by assumption Aig. This action is equal H{lCig,R), by assumption 
Big . By the dehnition of io this action is stop. Hence all robots stop after round io of 
Algorithm Write-Read. 



Corollary 1. Algorithm Write-Read has overhead 0{k/ log k). 



4 Conclusion 

It can be proved (see full version of this paper) that, in the absence of communication 
between the robots, the overhead of any exploration algorithm is f2{k), i.e., of the 
same order of magnitude as if only one out of k robots were used to explore the tree. 
While we showed that collective tree exploration can be done faster, if robots have 
some communication capabilities. This result should be considered a first step in the 
study of the impact of communication between robots on the efficiency of collective 
network exploration. Several related problems remain open, including: (1) hnd a tree 
exploration algorithm with constant overhead in the complete communication scenario; 

(2) hnd a good lower bound on the overhead of tree exploration for the write-read model; 

(3) generalize our results to exploration of arbitrary networks; and (4) consider other 
communication models in the context of collective network exploration. 
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Abstract. We introduce a new type of Steiner points, called off-centers, 
as an alternative to circumcenters, to improve the quality of Delaunay 
triangulations. We propose a new Delaunay refinement algorithm based 
on iterative insertion of off-centers. We show that this new algorithm 
has the same quality and size optimality guarantees of the best known 
refinement algorithms. In practice, however, the new algorithm inserts 
about 40% fewer Steiner points (hence runs faster) and generates 
triangulations that have about 30% fewer elements compared with the 
best previous algorithms. 

Keywords. Delaunay refinement, computational geometry, triangula- 
tions 



1 Introduction 

Meshes are heavily used in many applications including engineering simulations, 
computer-aided design, solid modeling, computer graphics, and scientific visual- 
ization. Most of these applications require that the shape of the mesh elements 
are of good quality and that the size of the mesh is small. An element is said to 
be good if its aspect ratio (circumradius over inradius) is bounded from above 
or its smallest angle is bounded from below. Mesh element quality is critical in 
determining interpolation error in the applications and hence is an important 
factor in the accuracy of simulations as well as the convergence speed. Mesh size, 
meaning the number of elements, is also a big factor in the running time of the 
applications algorithm. Between two meshes with the same quality bound, the 
one with fewer elements is preferred almost exclusively. 

Among several types of domain discretizations, unstructured meshes, in par- 
ticular Delaunay triangulations, are quite popular due to their theoretical guar- 
antees as well as their practical performance. Earliest algorithms that provide 
both size optimality and quality guarantee used balanced quadtrees to gener- 
ate first a nicely spread point set and then the Delaunay triangulation of these 
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points [1]. Subsequently, Delaunay refinement techniques are developed based on 
an incremental point insertion strategy and provide the same theoretical guar- 
antees [9]. Over the last decade, Delaunay refinement has become much more 
popular than the quadtree-based algorithms mostly due to its superior perfor- 
mance in generating smaller meshes. Many versions of the Delaunay refinement 
is suggested in the literature [2,6,7,8,9,10,11]. We attribute the large amount of 
research on Delaunay refinement to its impact on a wide range of applications. 
It is important to generalize the input domains that the Delaunay refinement 
works, as well as to improve the performance of the algorithm. Even a small 
but reasonable reduction in mesh size translates to important savings in the 
running-time of the subsequent application algorithm. 

The first step of a Delaunay refinement algorithm is the construction of a 
constrained or conforming Delaunay triangulation of the input domain. This ini- 
tial Delaunay triangulation is likely to have bad elements. Delaunay refinement 
then iteratively adds new points to the domain to improve the quality of the 
mesh and to ensure that the mesh conforms to the boundary of the input do- 
main. The points inserted by the Delaunay refinement are called Steiner points. 
A sequential Delaunay refinement algorithm typically adds one new vertex on 
each iteration. Each new vertex is chosen from a set of candidates — the cir- 
cumcenters of bad triangles (to improve mesh quality) and the mid-points of 
input segments (to conform to the domain boundary). Ruppert [9] was the first 
to show that proper application of Delaunay refinement produces well-shaped 
meshes in two dimensions whose size is within a constant factor of the best 
possible. There are efficient implementations [10] as well as three-dimensional 
extensions of Delaunay refinement [4,10]. 

In this paper, we introduce a new type of Steiner points, called off-centers, 
as an alternative to circumcenters and propose a new Delaunay refinement algo- 
rithm. We show that this new algorithm has the same theoretical guarantees as 
the Ruppert’s algorithm, and hence, generates quality-guaranteed size-optimal 
meshes. Moreover, experimental study indicates that our Delaunay refinement 
algorithm with off-centers inserts about 40% fewer Steiner points than the cir- 
cumcenter insertion algorithms and results in meshes about 30% smaller in the 
number of elements. This implies substantial reduction not only in mesh genera- 
tion time, but also in the running time of the application algorithm. For instance 
a quadratic-time application algorithm, if ran on the new meshes, would take 
about half the time it takes on the old meshes. 

2 Preliminaries 

In two dimensions, the input domain f2 is represented as a planar straight line 
graph (PSLG) — a proper planar drawing in which each edge is mapped to 
a straight line segment between its two endpoints [9]. The segments express 
the boundaries of f2 and the endpoints are the vertices of f2. The vertices and 
boundary segments of f2 will be referred to as the input features. A vertex is 
incident to a segment if it is one of the endpoints of the segment. Two segments 
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are incident if they share a common vertex. In general, if the domain is given as 
a collection of vertices only, then the boundary of its convex hull is taken to be 
the boundary of the input. 

The diametral circle of a segment is the circle whose diameter is the segment. 
A point is said to encroach a segment if it is inside the segment’s diametral circle. 

Given a domain f2 embedded in , the local feature size of each point a; G 
denoted by lfsr 2 (x), is the radius of the smallest disk centered at x that touches 
two non-incident input features. This function is proven [9] to have the so-called 
Lipschitz property, i.e., Ifsf 2 (a;) < lfso{y) + \xy\, for any two points x,y € 

Let P be a point set in R'*. A simplex r formed by a subset of P points 
is a Delaunay simplex if there exists a circumsphere of r whose interior does 
not contain any points in P. This empty sphere property is often referred to as 
the Delaunay property. The Delaunay triangulation of P, denoted Del{P), is a 
collection of all Delaunay simplices. If the points are in general position, that is, 
if no d -I- 2 points in P are co-spherical, then Del{P) is a simplicial complex. The 
Delaunay triangulation of a point set of size n can be constructed in O(nlogn) 
time in two dimensions [5] . 

In the design and analysis of the Delaunay refinement algorithms, a common 
assumption made for the input PSLG is that the input segments do not meet at 
junctions with small angles. Ruppert [9] assumed, for instance, that the smallest 
angle between any two incident input segment is at least 90° . A typical Delaunay 
refinement algorithm may start with the constrained Delaunay triangulation [3] 
of the input vertices and segments or the Delaunay triangulation of the input 
vertices. In the latter case, the algorithm first splits the segments that are en- 
croached by the other input features. Alternatively, for simplicity, we can assume 
that no input segment is encroached by other input features. A preprocessing 
algorithm, which is also parallelizable, to achieve this assumption is given in [11]. 

Radius-edge ratio of a triangle is the ratio of its circumradius to the length 
of its shortest side. A triangle is considered had if its radius-edge ratio is larger 
than a pre-specified constant [3 > \/2. This quality measure is equivalent to 
other well-known quality measures, such as smallest angle and aspect ratio, in 
two dimensions [9] . 



3 Delaunay Refinement with Off-Centers 

3.1 Off-Centers 

The line that goes through the midpoint of an edge of a triangle and its cir- 
cumcenter is called the bisector of the edge. Given a bad triangle pqr, suppose 
that its shortest edge is pq. Let c denote the circumcenter of pqr. We define the 
off-center to be the circumcenter of pqr if the radius-edge-ratio of pqc is smaller 
than or equal to f3 (Figure 1 (a)). Otherwise, the off-center is the point on the 
bisector (and inside the circumcircle), which makes the radius-edge ratio of the 
triangle based on p, q and the off-center itself exactly [3 (Figure 1 (b)). The 
circle that is centered at the off-center and goes through the endpoints of the 
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Fig. 1. The off-center and the circumcenter of triangle pqr is labeled c and ci re- 
spectively. The circumcenter of pqc is labeled as C2. If |cc2| < P\pq\ then c = ci (a). 
Otherwise, C\ and by construction |cc2| = P\pq\ (b). The off-circle of pqr is same 
as the circumcircle in (a) and shown as dashed circle in (b). 



shortest edge is called the off-circle. In the first case, off-circle is same as the 
circumcircle of the triangle. A bad triangle can have two shortest edges. In such 
cases, the off-center is defined once we arbitrarily choose one of the two edges 
as the shortest. 

Notice in Figure 1 (b) that, if we were to insert the circumcenter ci, the 
triangle pqci would still be bad and require another circumcenter insertion. We 
instead suggest to insert just the off-center c. This, of course, is a simplified 
picture and the actual behavior of Delaunay refinement is more complicated. 
Nevertheless, this very observation is the main intuition behind the expectation 
of smaller size meshes. In other words, around a small feature we create a good 
element with the longest possible new features. 



3.2 Algorithm 

At each iteration, we choose a new point for insertion from a set of candidate 
points. There are two kinds of candidate points: (1) the off-centers of bad tri- 
angles, and (2) the midpoints of segments. Let C denote the set of all candidate 
off-centers that do not encroach any segment. Let C denote their corresponding 
off-circles. Similarly, let B denote the set of all candidate off-centers that do 
encroach some segment. Candidate off-centers of this second type are rejected 
from insertion. Let B denote their corresponding off-circles. The midpoint of a 
boundary segment is a candidate for insertion if it is encroached by an off-center 
in B. Let T> be all midpoint candidates. Then we suggest the following algorithm 
to incrementally insert the candidate points. 
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Algorithm 1 Delaunay Refinement with Off-centers 
I nput: A PSLG domain 17 in 

Let T be the Delaunay triangulation of the vertices of 17. 

Compute is, C, t>; 

while C U 17 is not empty do 

Choose a point q from CUT> and insert q into the triangulation. If g is a midpoint 
of a segment s, replace s with two segments from q to each endpoint of s; 

Update the Delaunay triangulation T and recompute B, C, 27. 

end while 



4 Termination and Size Optimality 

When analyzing his algorithm, Ruppert [9] used the Delaunay property on the 
bad triangles, that is, their circumcircles are empty of other points. Unfortu- 
nately, the off-circles are not necessarily empty of other points. There is a small 
crescent-shape possibly non-empty region of each off-circle outside the corre- 
sponding circumcircle. This raises a challenge in our analysis. One easy way 
around this is to use a special insertion order among the off-centers. For in- 
stance, it is relatively easy to prove that the off-circle of the bad triangle that 
has the shortest edge is empty of all other points. Alternatively, an ordering 
that favors the bad triangles with the smallest circumradius serves for the same 
purpose. We could use one of these ordering strategies and apply the same ar- 
guments given in [9]. However, for the sake of a generic result, we opt for an 
arbitrary order in the analysis of our off-center insertion algorithm. 

We prove that the meshes generated by the off-center insertion algorithm 
is size optimal using the same machinery as Ruppert [9]. Moreover, we adapt 
the terminology introduced in [10] which includes a clearer rewrite of Ruppert’s 
results. We first prove that the edge length function is within a constant factor 
of the local feature size. Then, we conclude that the output mesh is size-optimal 
within a constant. 

Let insertion length of a vertex u, denoted r„, be the length of the shortest 
edge incident to u right after u is inserted (or were to be inserted if u is encroach- 
ing) . If u is an input vertex its insertion length is the shortest edge incident to u 
in the initial Delaunay triangulation of the input. Also, for each Steiner vertex 
u, we define a parent vertex, denoted ft, as the most recently inserted endpoint 
of the shortest edge of the bad triangle responsible of the insertion of u. This 
definition applies also for vertices that are considered but not actually inserted 
due to encroachment. 

Lemma 1. Let pqr be a bad triangle with off-center u. Then, r„ > Cq\uu\, for 
some constant Cq. Moreover, r„ > (Stu- 



Proof Without loss of generality, let pq be the shortest edge of pqr and u = p. 
Consider the following two cases: 
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— u is the circumcenter of pqr: By the Delaunay property, r„ > that is 
Co = 1. Moreover, since the triangle pqr is bad, |'um|/|p( 7 | > [3. The distance 
from p to g is at least Vp = r^. Hence, r„ > j3ru. 

— u is not the circumcenter of pqr: Let m be the midpoint of the segment pq and 
C 2 be the circumcenter of pqu. See Figure 1. The intersection of the off-circle 
and the circumcircle is empty by the Delaunay property. So, as a conservative 
bound, r„ is at least \um\. By construction, Zpum = arcsin(^)/2. Also, on 
the right triangle pum, cos{Zpum) = \um\/\uu\. Since /3 > -\/2, \um\ > 
IumI cos (arcsin(^^)/2). So, Co = cos (arcsin(^^)/2) « 0.98. Moreover, 

r„ > \um\ > |mc 2 | (because Zpc^q < 90°) 

= P\pq\ (by construction) 

> (3ru 0 

Lemma 2. For each vertex u, either r^ > lfsj 7 (u) or r„ > Cir^, for some 
constant Ci . 

Proof. We consider the following cases: 

— u is not a Steiner vertex: Then, its nearest neighbor in the initial triangula- 
tion is at most lfsi 7 (M) away, hence r„ > lfsj 7 (u). 

— u is an off-center Steiner vertex: Then, by Lemma 1 we know that r„ > /3r*, 
that is Cl = 1. 

— u is midpoint of an encroached subsegment s: If u is an input vertex, or 

is a Steiner vertex on a segment then r„ > lfsi 7 (u). Otherwise, ii is an 
encroaching rejected circumcenter. Let v be the nearest endpoint of s from 
ii. By definition, r* is at most litul. Moreover, since ii is inside the diametral 
circle of s, |ttr!| < V^ru. Therefore, r„ > r*/-\/2, that is Ci = l/-\/2. 0 

Theorem 1. The Delaunay Refinement with Off-centers terminates. 

Proof. Let Ife be the smallest distance between two non-incident features of the 
input PSLG. We prove, by contradiction that there are no edges shorter than 
Ifs introduced during the refinement. Suppose e is the first edge that is shorter 
than lis. Then, at least one end-point of e is a Steiner vertex. Let v be the most 
recently inserted endpoint of e. Let v be the grandparent of v. 

— If V is the off-center of a bad triangle, then by Lemma 2, > Prj,. 

— If u is the midpoint of an encroached segment then there are two sub-cases. 

If v is the off-center of a bad triangle, then by Lemma 2, > rj,l\/2 > 

(3r^l\/2 > r~. Otherwise, D is on a non-incident segment because of the 
PSLG input assumption. Then, clearly > Ife. 

In all cases, > r„ for some ancestor u of v. If r„ < Us, then r„ < Ifs, contra- 
dicting the assumption that e was the first such edge. Hence, the termination of 
the algorithm follows. This also implies that there are no bad triangles in the 
output mesh. 0 
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For each vertex u, let £>„ be the ratio of lfsj 7 (u) over r„. 

Lemma 3. //r„ > TujCi for some constant C 2 , then Du < 1/Cq + C 2 DU. 

Proof . Du = lisQ{u)/ru < (lfsi 7 (w) + \uii\)/ru (By Lipschitz property) 

< {Duru + rufC^ jru (By definition and Lemma 1) 

< (PuC2ru + r„/Co)rti 

= C' 2 ^* + l/C'o 0 



Lemma 4. There exist fixed constants Ct > 1 and Cs > 1 such that, for each 
vertex u, Du < Ct if u is a Steiner or rejected off-center vertex and Du < Cs 
if u is a midpoint Steiner vertex. 

Proof. We prove the lemma by induction. 

Basis: If u is an input vertex or on a segment, then Du = lfso(u)/ru < 1- 
Induction hypothesis: Lemma holds for vertex it. So, Du < m.ax{CT,Cs}. 
Induction: Now we make a case analysis: 

— If u is an off-center of a bad triangle, then by Lemma 3 (where C 2 = 1//3 by 

Lemma 1) and the induction hypothesis, Du < -^ + max{ C t, C's}//3. This 
implies that < Ct if 

Ct > 7^+max{Cr,Cs}//3 (1) 

— Otherwise, w is a midpoint of a subsegment s. If parent is an input vertex 

or on another segment, lemma holds by the basis of the induction. If ft is a 
rejected off-center of a bad triangle, then by Lemma 2, > r*/-\/2- So, by 

Lemma 3 (where C 2 = -\/2) and the induction hypothesis, ^ -\-\/2Ct. 

This implies that Du < Cs if 

Cs>^ + ^Ct (2) 

We choose Cs = and Ct = to satisfy both Inequalities 

Oo(p — v^j Oo(p — V^j 

(1) and (2). Hence the lemma holds. ED 



Lemma 5. For each vertex u of the output mesh, its nearest neighbor vertex v 
is at a distance at least C 3 lfsi 7 (u) for some constant C 3 . 

Proof. By Lemma 4, lfsj 2 (M)/r„ < C 5 , for any vertex u. If u was inserted after 
V, then \uv\ is at least r„. Hence, |mt| > r„ > lfsr 2 (M)/Cs, and the lemma holds. 
If V was inserted after u, then by Lemma 4 |ut| > > Ifsi 7 (w)/C 5 . By Lipschitz 

property, \uv\ > (lfsj 7 ('u) — \uv\)/Cs. Hence, |'ut| > lfsr 2 (M)/(Cs -E I), that is, 
C 3 = l/(Cs + I). 0 
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Local feature size for an output mesh M (which is a PSLG) is well-defined 
and denoted by lfsM()- Previous lemma essentially states that lfsM(a;) > lfsr2(a;), 
Vx C M . We next state a theorem proven by Ruppert [9], which together with 
Lemma 5 leads to Theorem 3, the main result of this section. 

Theorem 2 ([9]). Suppose a triangulation M with radius-edge ratio hound j3 
has the property that there is some constant C 4 such that IfsM(p) > Ifsj7(_p)/C'4, 
Vp G Then, the size ofT is less than C5 times the size of any triangulation 
of the input Q with hounded radius-edge ratio j3, where C5 = 0{Cf(3). 

Theorem 3. The Delaunay Refinement with Off-centers algorithm 
generates a size-optimal mesh. 



5 Experiments 

Implementing the Delaunay refinement with off-centers is as simple as replacing 
the circumcenter procedure in classical Delaunay refinement implementations 
with a new off-center procedure. Computing off-centers and circumcenters are 
very similar and take roughly the same time. Hence, savings in the number of 
Steiner points reported below also reflects the amount mesh generation time. 




5 10 15 20 25 30 

Minimum Angle (degrees) 



(a) 



% Savingsj • 
% Savingsy • 



lOK 20K 30K 40K 50K 60K 70K 80K 90K lOOK 
Number of input points 



(b) 



Fig. 2. (a) Percentage savings when the number of input points is lOK and the min- 
imum angle threshold samples the interval [2°-34°]. (b) Percentage savings when the 
minimum angle threshold is 30° and the number of input points samples [lOK-lOOK]. 



Earlier experiments with circumcenter insertion method indicates that the 
insertion order has an impact on the output mesh size. For instance, inserting 
the circumcenter of worst triangles first tends to result in smaller meshes. In this 
study, for fairness of comparison, we chose the ordering strategy that performs 
the best for the circumcenter insertion and use the same for the off-center in- 
sertion. For Delaunay refinement with circumcenters we used the CMU software 
triangle^ [10], which is reported to have over thousand users. 






160 A. Ungor 




Fig. 3. Input consists of 500 points. Smallest angle in both meshes is 29°. Circumcenter 
insertion adds 2399 Steiner points resulting a mesh with 4579 triangles (a). Off-center 
insertion adds 1843 Steiner points resulting a mesh with 3479 triangles (b). 




Fig. 4. Input PSLG is a plate with five holes described by 64 points and 64 segments. 
Smallest angle in the initial triangulation (a) is about 1°. Smallest angle in both out- 
put triangulations is 34°. Circumcenter insertion (triangle software) introduces 1984 
Steiner points resulting a mesh with 3910 triangles (b). Off-center insertion introduces 
only 305 Steiner points resulting a mesh with 601 triangles (c). 



Figure 2 illustrates a summary of our experimental results on randomly gen- 
erated point sets. Let Sc and So be the number of Steiner points inserted by the 
circumcenter and the off-center insertion methods, respectively. Also, let Me and 
Mo be the number of elements generated by the circumcenter and the off-center 
insertion methods, respectively. We report the following two measures: 



. . Sc -So 

Savings I = — p, , 



SavingsM 



Me -Mo 
Me 



Percentage savings both in the number of Steiner points and in the mesh size 
increases as the user specified minimum angle threshold gets higher (Figure 
2 (a)). We also observed that for a given threshold angle, the savings remain 
consistent as we change the input size (Figure 2 (b)). For a visual comparison of 



^ Available at http://www-2.cs.cmu.edu/~quake/triangle.html 
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the off-center and the circumcenter insertion algorithms see Figure 3 where the 
input is a randomly generated point set. 



6 Discussions 

By definition, the off-center of some triangles is same as their circumcenter. The 
off-center and the circumcenter insertion algorithms are likely to generate very 
similar (sometimes the same) meshes when the initial triangulation is reasonably 
good to begin with. In most applications, however, tiny angles are ubiquitous in 
the initial Delaunay triangulation. Figure 4 demonstrates the output of the two 
algorithms in one such case. In this example, our off-center insertion algorithm 
gives a mesh that is a factor six smaller than the output of triangle. We 
also observed many other examples, where the off-center insertion algorithm 
terminates (computing a quality-bounded mesh) and triangle does not. 

This new insertion scheme also leads to a parallel Delaunay refinement al- 
gorithm that takes only 0{\og{L/h)) iterations to generate quality-guaranteed 
size-optimal meshes, where L is the diameter of the domain, and h is the smallest 
edge length in the initial triangulation. This is an improvement over the previ- 
ously best known equivalent algorithm that runs 0(log^(T//i)) iterations [11]. 
Due to space limitations, we do not include the description and the analysis of 
the new parallel off-center insertion algorithm in this publication. Furthermore, 
we plan to extend the off-center algorithm to three dimensions and explore its 
benefits both in theoretical and practical fronts. 
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Abstract. We present space-efficient algorithms for computing the con- 
vex hull of a simple polygonal line in-place, in linear time. It turns out 
that the problem is as hard as stable partition, i.e., if there were a truly 
simple solution then stable partition would also have a truly simple so- 
lution, and vice versa. Nevertheless, we present a simple self-contained 
solution that uses O(logn) space, and indicate how to improve it to 0(1) 
space with the same techniques used for stable partition. If the points 
inside the convex hull can be discarded, then there is a truly simple so- 
lution that uses a single call to stable partition, and even that call can 
be spared if only extreme points are desired (and not their order). If the 
polygonal line is closed, then the problem admits a very simple solution 
which does not call for stable partitioning at all. 

1 Introduction 

An algorithm is space- ejficient, if its implementation requires little or no extra 
memory beyond that which is needed to store the input. In-place algorithms 
are tricky to devise due to the limited memory considerations. For the classical 
sorting problem, both quicksort and heapsort are in-place algorithms (it is well- 
known that the first can be implemented with logarithmic expected amount of 
extra memory, and the second with a constant amount [4]). It turns out that 
devising in-place merge and mergesort is a challenge [6,8,9]. Many other classical 
problems have been considered when space is dear. 

Recently, several classical problems of computational geometry have been 
revisited with space requirements in mind. Two-dimensional convex hull of points 
is one of them that has been solved in almost every respect in the past twenty 
years: there are optimal, output-sensitive solutions which compute the smallest 
convex polygon enclosing a set of points in the plane. In [3], Bronnimann et al. 
give optimal in-place algorithms for computing two-dimensional convex hulls. For 
this problem, the points on the convex hull can be reordered at the beginning 
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of the array, so that the output merely consists of a permutation of the input 
(encoded in the input array itself), and the number of points on the hull. 

Space-efficient algorithms have many advantages over their classical coun- 
terparts. Mostly, they necessitate little memory beyond the input itself, so they 
typically avoid virtual memory / paging and external I/O bottlenecks (unless 
the input itself is too large to fit in primary memory, in which case I/O-efficient 
algorithms can be used). 

Convex hull of a simple polygonal line. Computing the convex hull of a sim- 
ple polygonal (either open or closed) is another of the classical problems of 
computational geometry, and was long suspected to be solvable in linear time. 
Unfortunately, correct algorithms are outnumbered by the number of algorithms 
proposed in the literature that have turned out to be flawed [1]. Two algorithms 
that are correct are Lee’s algorithm [10] (a variant of Graham’s scan for closed 
polygonal chains) and Melkman’s [11] (works for open polygonal chains as well, 
in an online fashion). 

The problem is two-fold: the polygonal line can be either closed or open. 
For closed chains, we can use Lee’s stack-based algorithm and implement the 
stack implicitly in the array, using the fact that the vertices on the convex 
hull are sorted. This leads to a linear-time constant-space solution presented in 
Section 2.1. 

For open chains, the problem is complicated by the fact that either endpoint 
may lie inside the convex hull. The only solutions known use a deque and we show 
how to encode a deque implicitly with logarithmic extra storage in Section 3.1. 
We then improve the storage required to constant size using known techniques. 
Another solution can be obtained by using the relationship mentioned in [3] 
between convex hulls and stable partition and by reusing space from points that 
have been discovered to be non-extreme. 

Other related work. There seems to be little point for in-place algorithms when 
the output requires linear space simply to write down, so one may assume that 
the output is a permutation of the input or can otherwise be represented in 
a small amount of memory (e.g., the answer to many geometric optimization 
problems typically consist of identifying a constant-sized subset of the input). 
Recently, Chen and Chan [5] proposed another model in which the output is 
written to an output stream and never read again; only a limited of extra memory 
is available for workspace. They gave a solution for the problem of counting or 
enumerating the intersections among a set of n line segments in the plane, with 
O(log^n) extra memory. There could be up to 0(n^) intersections, but they are 
written to the output stream and never needed again beyond what is stored 
in the logarithmic working memory. A similar model holds for other classical 
problems such as Voronoi diagrams and 3D convex hulls. 

Equivalence with stable partition. There are linear-time algorithms for perform- 
ing stable partition in-place (i.e., how to sort an array of O’s and I’s in-place, 
respecting the orders of the O’s and I’s); see papers by Munro, Raman, and 
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Salowe [12] and Katajainen and Pasanen [7]. These algorithms are not simple, 
however. Nevertheless, efficient implementations are provided as a routine in 
modern algorithm libraries, e.g. the C++ STL. A truly practical implementa- 
tion may use available extra storage to speed up the computation, and only 
resort to the more involved algorithms mentioned above if no extra storage can 
be spared. Hence it makes sense to try and obtain simple algorithm that use 
stable partition as subroutine. 

The partitioning problem itself is linear-time reducible to convex hull in the 
following way: Given an array A of O’s and I’s, compute the convex hull of the 
polygonal line defined by B[i] = ~ *)(^[*j ~ 0.5)^ These points lie on 

two parabolas y = ±|x(n — x), and therefore all appear on the boundary of the 
convex hull, first the points for which A[i] =0 in order on the lower parabola 
and those for which A[i] = 1 in reverse order on the upper parabola. Thus the 
stable partition can be constructed in linear time by reversing the I’s in the final 
array. It thus appears difficult to have a truly simple linear-time algorithm for 
computing the convex hull of a simple polygonal line, given that no truly simple 
linear-time algorithm exists for stable partition. 

It turns out that by using stable partition as an oracle, we obtain a very 
simple algorithm. If we are not interested in the order of the points on the 
convex hull, then we may even forego stable partition altogether, and therefore 
obtain a truly simple algorithm given in Section 3.3. 



2 Closed Chains 

For closed simple polygons, the solution turns out to be very simple: the input is 
represented in an array A[l] . . . A[n] of vertices, which can be cyclically permuted 
at will. One may therefore assume that A[l] is a vertex of the convex hull (e.g. 
the vertex of minimum abscissa) . There is an algorithm due to Lee which closely 
resembles Graham’s scan and uses only one stack [10]. We give a brief outline of 
the algorithm first, then show how to implement it in a space-efficient manner. 



2.1 Overview of Lee’s Algorithm 

In this description, we assume that the polygon is oriented counterclockwise 
and that A[l] is minimal in some direction. Fortunately, the orientation of the 
polygon is given by the order type of (A[n], A[l], A[2]) and can be computed 
in 0(1) time. Should it turn otherwise, the whole array can be reversed. The 
invariant is the vertices on the stack form a convex polygon (when viewed as 
a circular sequence). That is, the vertices in the stack form a convex polygonal 
line, and the point at the bottom of the stack (i.e. A[l]) is always to the left of 
the line joining two consecutive points in the stack. 

Lee’s algorithm starts by pushing the first two vertices on the stack, and 
maintains the line L that connects the top two vertices on the stack, as well as 
the line L' joining the bottom and top vertices of the stack. When a point is 
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processed, it may fall into several regions as depicted in figure l(left). There are 
several cases: 

• If current vertex is not to the left of L, restore the invariant by backtrack- 
ing/deleting vertices from the top of the stack until a convex turn is encoun- 
tered, or there is only one vertex left on the stack. Then push the current 
vertex on the stack and recompute L and L' . 

• If current vertex is to the left of L but not to the left of the line L' (i.e. falls 
into the pink or yellow region), then ignore it and process next vertex. 

• If current vertex is to the left of both L and L' , then push the current vertex 
on the stack and recompute L and L' . 

Note that in the third case, both lines L and L' always rotate counterclockwise, 
but in the first case, after popping vertices from the stack they may rotate either 
way. In particular, some previously processed vertices may end up on the left side 
of L' . The algorithm therefore does not maintain the invariant that the stack 
contains the convex hull of the vertices seen so far. It does however maintain 
the invariant that the stack contains the prefix of the convex hull ending at the 
vertex on top of the stack, and that the subsequent points (from the top of the 
stack to the current point) are to the right of L' . In particular, if the last point 
has maximal abscissa, the stack contains the lower convex hull of the points, a 
fact that will be useful later. And more importantly, Lee proved that if the chain 
is closed, then after all the points have been processed the stack contains the 
entire convex hull. 




Fig. 1. (left) The two types of regions for processing a point in Lee’s algorithm. The 
points on the stack are in blue, the blue line is L and the red line is L' . (right) The 
four types of regions for processing a point in Melkman’s algorithm. (Both diagrams 
courtesy of Greg Aloupis). 



2.2 A Space-Efficient Implementation 

Implementing the previous algorithm in place is trivial since the stack can be 
stored in the prefix of the array. The minimal-abscissa vertex can be found and 
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the entire array permuted, both in linear time and with a constant amount of ex- 
tra memory. Moreover, the points inside the convex hull can be kept in the array, 
if we are careful to perform swaps instead of assignments when popping from 
and pushing into the stack. The algorithm then produces a permutation of the 
input and an index h such that form a convex polygon and the points 

in A[h+ l]..A[n] are inside the convex hull. Note that this algorithm is not much 
different than the in-place Graham- Andrew scan proposed by Bronnimann et al. 
[3] , when the points have already been sorted by abscissa; the only modification 
is the use of the line L', and the fact that the points haven’t been sorted but 
instead that they lie on a simple polygonal line. The runtime of the algorithm is 
clearly linear, and it uses 0(1) extra memory. 



2.3 Open Chains: A Special Case 

Although this algorithm only works for closed chains (see the next section for 
open chains), it also works for open chains in the special case where both end- 
points are extreme vertices. For simplicity of the discussion, we assume that 
A[l] has minimal abscissa, and A[n] is maximal. This lets us use the fact that 
the convex hull is a concatenation of the lower hull (from A[l] to A[n]) and the 
reverse of the upper hull (from A[n] to A[l]). 

As observed above, running the one-stack algorithm of from A[l] to A[n] 
would produce the lower hull, and from A[n] back to A[l] the upper hull. Unfor- 
tunately, we cannot run both, but we can separate the points above and below 
the line L joining A[l] to A[n] by using stable partition, and reversing the second 
half. This gives us a polygonal line C\ joining A[l] to A[n] that stays below the 
straight line L, followed by another polygonal line £2 starting at A[n] that stays 
above L. 

Unfortunately, the resulting polygonal lines are not necessarily simple, but 
they have structure: in particular the lower hull of £1 is the same as that for 
£, and the vertices of £1 occur in the same order they occur on £ (this is 
a consequence of Jordan’s curve theorem, proved in [2]). This is sufficient to 
ensure that the one-stack algorithm still works as intended on C\ and produces 
the lower hull of £. Similarly, running the algorithm on £2 produces the upper 
hull of £. The two hulls can be concatenated in place to form the whole convex 
hull of £. 



3 Open Chains 

While the former algorithm works easily for closed chains, it does not work for 
open polygonal chains, due to the fact that some vertices might be removed from 
the stack but appear to the left of the line L' and therefore contribute to the 
convex hull. Melkman [1 1] showed how to use a deque instead of a stack to cope 
with this problem. We give a brief outline of his algorithm first, then show how 
to adapt the implementation to make it space-efficient. 
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3.1 Overview of Melkman’s Algorithm 

The points are processed in the order in which they appear in the input array. 
Melkman’s algorithm maintains their CH as a deque (insertion and deletion 
from both front and back), which is initially the triangle formed by the first 
three points. For simplicity, we describe a version that uses a deque and one 
extra point, stored in a special register, which contains the last point added to 
the convex hull. 

When a point is processed, it can fall into four types of regions, depicted in 
figure 1 (right); note that it cannot fall into any other region without violating 
the simplicity of the polygonal line. These regions are determined solely by the 
point in the special register and the front and back vertices of the queue. The 
invariant of the algorithm is that the special point and the vertices in the deque 
form a convex polygon, when viewed as a circular sequence . The simplicity of 
the polygonal line, together with Jordan’s curve theorem, imply that when a 
point comes out of the yellow region, it always does so through one of the two 
edges of this polygon that join the special point to the top or the bottom vertex 
of the deque. 

• If yellow, ignore this and all following vertices until one emerges into the 
other regions. 

• If red, push the point in the special register onto the front of the deque, then 
insert the current point into the special register. To restore the invariant, 
backtrack/delete vertices from the front of the deque until a convex turn is 
encountered. 

• If green, push the point in the special register onto the back of the deque, 
then insert the current point into the special register. Restore the invariant 
by backtracking/deleting vertices from the back of the deque until a convex 
turn is encountered. 

• If blue, simply replace the point in the special register by the current point, 
and restore the invariants as in both cases red and green. 

This process is repeated for every point in turn. Note that the algorithm is 
completely symmetric and therefore does not assume the any orientation of the 
polygonal line. In fact, the first point of the array does not need to appear on 
the final convex hull, nor does the chain need to be closed. The algorithm is 
online, meaning that points can be added in a streaming fashion. 



3.2 A Space-EfRcient Implementation Using Implicit Pointers 

The main problem is how to implement a deque of n elements in place, i.e., using 
only the first n cells of the array when n points have been processed. This is a 
non-trivial task, at least as hard as stable partitioning. We show that techniques 
developed for stable partitioning can actually be adapted to solve our problem. 

If we represent the deque as a doubly linked list, then each deque operation 
can trivially be accomplished in constant time. The problem with this approach 
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is of course the extra space needed for the pointers. The key idea is that pointers 
need not be stored explicitly but can be encoded implicitly via permutations of 
the input elements: since the points in the deque form a convex polygon, they 
are sorted by, e.g., angular order. One way to do that is to fix the origin inside 
the convex hull (e.g., the barycenter of the first three points) and pick a direction 
(e.g., the horizontal). In the sequel, when we say that a is less than b, we mean 
that its principal polar angle is less than that of b. 

In more details, we store the first few and last few elements in two separate 
small deques (the front deque and back deque). The rest of the elements are 
stored inside the given array, which is divided into blocks of size s = 4|"log2n]. 
The blocks are linked together, in order, by the following scheme: Within each 
block, we encode 2 |"log 2 n] bits by pairing consecutive elements and permuting 
each pair (a, b) so that having a less than b means the corresponding bit is a 0 and 
vice versa. These bits form the two pointer fields (the successor and predecessor) 
of the doubly linked list. 

Insertions/deletions to the front/back are done directly within the two small 
deques, whose sizes are kept between 0 and 2s. When the size of the front/back 
deque reaches 2s (a full event), we extract s elements from it, form a new 
block, and update two pointers from and to the new block. When the size of 
the front/back deque reaches 0 (an empty event), we take out the first/last 
block of the linked list, and insert its s elements into the small deque; further- 
more, to ensure that the used blocks occupy a prefix of the array, we swap the 
deleted block with a block at the end of the array and readjust a constant num- 
ber of pointers. After a full event, the corresponding small deque has exactly s 
elements and hence the next event will not occur for another s operation. Each 
such event processing requires 0{s) time, but two events are separated by at 
least s insertions/deletions, so the amortized cost per insertion/deletion in the 
deque is 0(1). 

The extra space used is O(logn), for the two small deques. By more theo- 
retical tricks, the space complexity can be made even smaller. One option is to 
handle the small deques recursively, by dividing their elements into tinier blocks 
in the same manner. Similar to Munro, Raman, and Salowe’s stable partition 
method [12], this should result in an 0(log*n) space bound. Another option is 
to recurse for just two levels until the deque size is small enough (O(loglogn)) 
so that all pointers can be packed into a single word, and pointer manipulations 
can be done in 0(1) RAM operations by table lookup. This is analogous to Kata- 
jainen and Pasanen’s stable partition method [7] and should yield 0(1) space. 
Since either option is probably too complicated for actual implementation, we 
will not elaborate further on these refinements. 

At the end, to produce the convex hull vertices in order in a prefix of the 
array, we can simply perform repeated deletions from one end of the deque. 
Although consecutive pairs have been permuted by the above process, we can 
permute the pairs back, knowing that they should form a convex polygon. As 
before, by being careful, we can ensure that points not on the hull boundary 
remain in a suffix of the array. 
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3.3 A Simpler, “Destructive” Implementation 

If points not on the hull boundary need not be in the final array and can be 
destroyed, we can give a simple algorithm that directly reduces the problem 
to stable partitioning. In fact, if the convex hull vertices need not be ordered 
in the final array (i.e., we just want to identify the set of extreme points), we 
can avoid the stable partitioning subroutine altogether and thus obtain a truly 
simple algorithm. 

The problem is again how to implement the deque in-place. The key idea 
is this: if there are no deletions (i.e., all the points are on the boundary of the 
convex hull), then nothing needs to be done: all the vertices are extreme; in fact, 
in this case simply stably partitioning the points with respect to the line joining 
the first and the last point and reversing the second portion produces the convex 
hull. But if there are many deletions, cells of deleted elements can be used to 
store other information (pointers, for example). 

We describe one approach based on this idea. For simplicity’s sake, we assume 
that each cell can hold two extra bits for marking purposes {live or dead, and — 
or -|-); later we will discuss how this assumption can be removed. The deque has 
to be stored within the first n cells of the array, where n is the current number 
of insertions {not the number of elements currently in the deque). 

Basically, the deque is decomposed into two stacks: elements of sign — in the 
array form the front part reversed, and elements of sign -|- form the back part. 
Insertion is straightforward: just put the new element at the end of the array, 
and mark it — or -|- depending on whether we are inserting to the front or back 
of the deque. 

An element is deleted by marking it as dead. To speed up computation, we 
use dead cells to store pointers as follows: Consider the elements of the array of 
one sign, in left-to-right order. They form a sequence of alternating blocks of live 
elements and dead elements. The invariant is that the rightmost element of each 
dead block should hold a pointer (i.e., the index) to the rightmost element of 
the preceding live block. See figure 2 for an illustration, where the O’s represent 
live elements and the X’s represent dead elements. 



o 





Fig. 2. Representing a deque (or two stacks) in a single array. Live and dead elements 
of one sign are shown. 



It is not difficult to maintain this invariant after a deletion: just imagine 
when the rightmost O is changed to an X in figure 2; several cases may arise, 
but only a constant number of pointers need to be updated. To demonstrate the 
simplicity of the approach, we provide complete pseudocode of the insertion and 
deletion procedure below. Here, points to the rightmost live element of sign 
cr G {—,+}, and da points to the rightmost dead element of sign cr. 
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Inserter (x) : 

1. £„ = k = k + 1, A[k] = X 

2. mark A\k] live and of sign a 

Delete^O : 

1. mark A[£„] dead 

2. if £„ > da- then da = ta 

3. i = predecessor of ia among elements of sign u 

4. if predecessor exists then 

5. if A[i] is live then ia = A[da] = i else ta = A[da\ = A[i] 

6. else { 

7. compress array by keeping only live elements 

8. k = size of compressed array 

9. reverse first half of array and switch the sign of these elements 
10. } 

Searching for the predecessor of a live element (line 3) is a non-constant-time 
operation and requires time proportional to the distance between the element 
and its predecessor. However, this search is done at most once for each element 
(when its status changes from live to dead), so the total time is still linear in 
the size of the array. 

One scenario has not yet been addressed: what if we run out of elements 
of one sign? This can be fixed by a standard amortization trick (used in the 
well-known two-stack simulation of a deque): we just re-divide the deque in the 
middle and start a new phase, as described in lines 7-10. (Notice that lines 7 
and 9 can be done in-place easily.) If the f-th phase initially has ki elements and 
ends after rrii insertion/deletion operations, then the phase requires 0{ki -I- mi) 
total time. Because the above strategy ensures that > ki/2, the running 
time of the f-th phase is 0{mi) and the overall running time is 0(n), i.e., the 
amortized cost per update remains 0(1). 

At the end, we can compress the array to remove all dead elements and thus 
have the convex hull vertices stored in a prefix of the array. If the vertices are 
required to be ordered, we can invoke a stable partition subroutine to put all — ’s 
before all -l-’s and reverse the — elements; otherwise, our algorithm is completely 
self-contained. 

Finally, if it is not possible to steal two extra bits per cell, we can insert/delete 
to the deque only when we have gathered a pair of elements. We can permute 
the pair (a, b) so that having a left of b means the sign is — and vice versa. A 
dead cell can be signaled by a pair (a, b) with either a or 6 a point at infinity. 

4 Conclusion 

The problem of computing the convex hull of a simple polygonal line is well- 
known to be solvable in linear time. In this paper, we have shown that it can be 
solved in linear time in-place, and that the in-place problem is as hard as stable 
partition in the following sense: any linear-time algorithm for one implies a not 
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too convoluted linear-time algorithm for the other. Given that the algorithms 
for stable partition are rather involved, we do not expect an easy solution for 
this problem either. Nevertheless, we have given a simple 0(log n)-space solution 
which can be extended to an 0(l)-space solution at the expense of the complexity 
of the implementation. If the chain is closed, the problem admits of a very simple 
in-place linear-time solution, which does not call for stable partitioning at all. 
If the chain is open but both endpoints are extreme, then a single call to stable 
partition and two calls to the same very simple in-place linear-time algorithm 
solve the problem. 
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Abstract. The bisection method is the consecutive bisection of a trian- 
gle by the median of the longest side. This paper introduces a taxonomy 
of triangles that precisely captures the behavior of the bisection method. 
Our main result is an asymptotic upper bound for the number of sim- 
ilarity classes of triangles generated on a mesh obtained by iterative 
bisection, which previously was known only to be finite. We also prove 
that the number of directions on the plane given by the sides of the 
triangles generated is finite. Additionally, we give purely geometric and 
intuitive proofs of classical results for the bisection method. 



1 Introduction 

Longest-side bisection algorithms for the refinement of 2-dimensional triangula- 
tions were developed to fill a gap in the design of adaptive software for finite 
element applications to analyze physical problems described by partial differen- 
tial equations, where the availability of algorithms able to produce automatic 
and local refinement of the mesh is crucial. A discussion of the algorithms and 
some generalizations can be found in [4,5]. These algorithms were designed to 
take advantage of the non-degeneracy properties of the iterative longest-side 
bisection (bisection method) of triangles, which essentially guarantee that con- 
secutive bisections of the triangles nested in any triangle to of smallest angle CTq 
produce triangles t (of minimum angle at) such that at > aoj^, and where the 
number of non-similar triangles generated is finite. 

The systematic study of the bisection method began in a series of papers [2, 
7,8,9, 1] around two decades ago. First, Rosenberg and Stenger [7] proved that 
the method does not degenerate the smallest angle of the triangles generated 
by showing that it does not decrease beyond cr/2, where cr is the smallest angle 
from the triangle we started. 

Then Kearfott [2] proved a bound on the behavior of the diameter (the length 
of the longest side of any triangle obtained) . In [8] a better bound was presented 
for certain triangles. This bound was improved independently by Stynes [9] and 
Adler [1] for all triangles. From their proofs they also deduced that the number of 
classes of similarity of triangles generated is finite, although they give no bound. 

There is very little research so far on complexity aspects of the bisection 
method. Although it is known that different types of triangles behave radically 
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different under iterative bisection (“good” and “bad” triangles), no systematic 
classification of them is known. 

This paper attempts to fill these gaps in the analysis of the bisection method. 
We present a precise taxonomy that captures the behavior of the bisection 
method for different types of triangles. We introduce as main parameter the 
smallest angle and prove that in the plane it predicts faithfully the behavior of 
the bisection method. We use this framework to prove new results and to give 
intuitive proofs of classical results. 

The contributions of this paper are as follows: 

— A taxonomy of triangles reflecting the behavior of the bisection method. We 
consider six classes of triangles, and two main groups. 

— An asymptotic bound on the number of non-similar triangles generated. We 
prove a super-polynomial upper bound, identify the instances where this 
bound is polynomial, and describe worst case instances. 

— An analysis of lower bounds on the smallest angle of triangles in the mesh 
obtained using the bisection method for each class of triangles defined. 

— A proof that there is a finite number of directions in the plane generated by 
the corresponding segments (sides) of the triangles generated, and asymp- 
totic bounds on this number. 

Additionally, we present a unified view of the main known results for the 
bisection method from an elementary geometry point of view. This approach 
allows intuitive proofs and has the advantage of presenting the geometry inherent 
to the method. 

2 Notation and Preliminaries 

Capital letters denote points on the plane. In order to simplify we will avoid 
extra symbols and sometimes overload some notations. AB denotes a segment 
as well as the length of this segment usually denoted by AB. An angle ZACB 
denotes the actual instance as well as the value (measure) of it. A circumference 
of center A and radius r is denoted by C(A, r). 




Fig. 1. Triangle ABC with AB > BC > CA. D is the midpoint of AB. 



A bisection, by the median of the longest side, of triangle ABC with AB > 
BC > CA, is the figure obtained by tracing the segment CD, where D is the 
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midpoint of the longest segment AB. See Figure 1. We will study the properties 
obtained by successively bisecting the triangles so obtained. 

For a given triangle PQR, denote by opgn (respectively Jpqr) the value 
of the smallest (respectively greatest) angle in triangle PQR, and by (ipQp the 
remaining angle. 

We will need a simple and useful technical lemma: 



c 




Fig. 2. BI is bisectriz, BH and CD are medians, G is center of gravity. 



Lemma 1. For AABC with AB > BC > CA, it holds ZBCD > ^ZDBC. 

Proof. (See Figure 2.) Let be ABC a triangle with AB > BC > CA, let BI 
the bisectriz of ZABC), let BH and CD be medians, and let C be its center 
of gravity. From AB > BC > CA and elementary geometry it follows that 
BC > CC, hence x > z > y/2. Note that x = y/2 if only if AB = AC. 

To simplify the study of the bisection method, it is convenient to group two 
or three consecutive bisections in triangle ABC , in what we will call a step, 
as follows. For this discussion refer to Figure 3. Let E be the middle-point of 
segment CB. Note that if CD > CE, then CD, DE and EE are consecutive 
bisections by the median of the longest side, and after these bisections we get 
exactly three non-similar triangles: ADC, CDE and CDB (all others are sim- 
ilar to one of these, see left side of Figure 3). We call these three consecutive 
bisections a step of type A. Note that AADC is the only triangle that possibly 
generates new triangles non-similar to already generated ones. 




Fig. 3. Steps: Of type A on the left when CD > CE, and of type B on the right when 
CD < CE. Vertices D, E and F are midpoints of the corresponding segments. 
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On the other hand, if CD < CE, the longest side in triangle CDE is now CE. 
Hence we bisect only twice {CD and DE) and get two new triangles, namely 
ADC and CDE (see right side of Figure 3). We call these two consecutive 
bisections a step of type B. Note that for type B bisections, triangles ADC and 
CDE are the only triangles that could generate new triangles non-similar to 
already generated ones. 

3 A Classification of Triangles 

The behavior of the bisection method depends on the type of triangle to be 
bisected. We will partition the set of all triangles in classes that reflect this 
behavior by considering some elementary geometrical properties. The starting 
point will be a triangle ABC as in Figure 1. 






p 




Fig. 4. Regions 
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Region 


Defining properties 


Other properties step type 


I 


AD<CD< AC 


7 < 7’"/2 


A 


II 


AD < AC < CD 


7 < 7r/2 


A 


III 


AC < AD < CD 


7 < 7’"/2 


A 


IV 


AC, CD < AD 


7 > 7r/2 


A/B 


V 


AD < AC] CD < CE 


7 > 7t/2 


B 


VI 


CD<AD< AC] CD > CE 


7 > 7’‘/2 


A 



The analysis is based on the geometrical places where vertex C of triangle 
ABC lies, assuming AB > CB > CA. For this discussion, we refer to Figure 4, 
where AB represents the longest side of the hypothetical triangle, D the midpoint 
of AB, M is the midpoint of AD, N is such that AN = ABjZ, MO 1- AB 
and DP _L AB. The arc C\ belongs to a circumference C{B,AB), arc C 2 to 
C{D,AD), arc C3 to C{N,AN) and finally arc C4 to C{A,AD). 

From the condition AB > BC > CA, it follows that vertex C of a triangle 
with base AB must be in the region bounded by arc AP and lines PD and AD. 
We partition this region into six subregions, denoted by Roman numerals, with 
the property that triangles in the same subregion present similar behavior with 
regard to bisection by the median of the longest side, as stated in Lemma 2. 
Note that arc C3 is the set of points C for which CD = CE, and is precisely 
the geometrical place which separates those triangles for which steps of type A 
apply from those triangles for which steps of type B apply. Table in page 5 lists 
defining properties of triangles in each region. 

Let us consider the process of bisecting iteratively a triangle. In what follows 
by a “new triangle” we mean a triangle not similar to one already generated. 
We will proceed following steps of type A or B, as follows: 

1. Perform a step of the corresponding type (depending on the triangle); 

2. Choose nondeterministically one of the new triangles obtained. If there is no 
such triangle (i.e. all triangles generated are similar to previous ones), stop; 
else goto 1. 

Lemma 2. Let ABC he a triangle. For the iterative process described above it 
holds: 

1. If C is in region I, it generates at most 4 non- similar triangles as shown in 
Figure 5, all of them belonging to region I. 

2. If C is in region II, new AADC belongs to region I. 

3. IfC is in region III, new AADC belongs either to regions II or III. Moreover, 
in no more than r5.71og(^)] steps the only new triangle generated belongs 
to region II. 

4 . If C is in region IV or V, after no more than [(7 — 7r/2)/cr] steps, the only 
new triangle has 7 < tt/2 (i.e. belongs to region I, II or III.) 

5. If C is in region VI, new AADC belongs to region I. 

Proof. 1. Follows from the analysis of the relations among sides of the triangles 
generated. See definition of region I and Figure 5. 
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C 




Fig. 5. After bisections in a triangle in Region I 



2. Consider the triangle ABC' in region I, where C is the reflex of C on 
the line MO. We know that AADC is in region I. Now observe that triangle 
AADC is congruent to AADC . 

3. First, observe that AADC has 7 < tt/2, and (Jadc > (because 

O' ADC = AADC and Lemma 1). Now, because at each step a is increased by 
3/2, it is enough to And the smallest k such that (f)^cr > 7t/6, that is, k > 
log(^)/log(3/2). The solution, denoted by k{a), is k{a) = |"5.71og(^)] . 

4. After one step, the only new triangles generated, AADC and ACDE, 
decrease their greatest angle by (Jabc- Hence it is enough to And the smallest 
k such that j — ka < tt/2. The solution depends on two parameters and is 
[(7-7r/2)/cr]. 

5. Just observe that ^adc < tt/ 2 and oadc is the same as ZCAB of AABC. 



4 Number of Similarity Classes of Triangles 

We are ready to prove the main theorem: 

Theorem 1. Let ABC a triangle and a its smallest angle. 

1. The number of steps to be executed by the bisection method until no more 
non-similar triangles are generated is 0{a~^) 

2. If C is above arc C 3 , then the number of non similar triangles generated by 
the bisection method is 0 (log((T“^)) 

3. The number of non similar triangles generated by the bisection method is 

Proof. 1. Let us calculate the maximum number of steps to be executed before 
arriving to region I in the worst case. This occurs for triangles in regions IV 
or V. A rough upper bound in the number of steps is given by the sum 2 + 
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|"5.71og(^)] + [ ( 7 — 7 t/ 2 ) /ct] This number is asymptotically linear in because 
7t/3 < 7 < 7T. 

2. For a triangle ABC above arc C 3 , the number N{ABC) of non-similar 
triangles is 1 + N{ADC) (the 1 corresponds to ADBC). The statement follows 
from Lemma 2, items 1, 2 and 3. 

3. The complex case is region IV. (The analysis for region V is similar.) Here 

N{ABC) = N{ADC) + N{CDE). First let us prove that aADC > ffAsc- If 
C is to the left of MO, then aAoc is the angle AADC and by Lemma 1 we 
are done. Next consider the geometric place of the set of points C such that 
Pabc = This is a line L passing through D with negative slope. If C 

lies to the right of L, then AADC will be in region IV to the left of MO and 
we are in the previous case in one step. If C lies in between L and MO, then 
aADC = ACAD = (3abc > I^abc by definition. 

Now, using the fact that both triangles ADC and CDE have 7 diminished 
by a, the fact already proven that cadc A ^aABC, and observing that aoBC A 
aABCj we have the following recurrence equation for the number N('j,a) of 
non-similar triangles generated: 

fV( 7 , ct) = 7V(7 - CT, ^a) + Nij -a, a), 

and Lemma 2.4 gives a bound to the number of necessary steps to take. It is 
not difficult to see that this recurrence essentially reduces to one of the type 
/(n-) = f{n/2) + f{n — 1). This recurrence has no polynomial solution, and 
0 (n*°®") is an upper bound, from where we get the bound 0 ((ct“^)*°®^'^ K 

It it interesting to note that not only the number of non-similar triangles 
generated by the bisection method is finite, but a stronger result can be proved: 

Proposition 1. The bisection method generates a finite number of different di- 
rections in the plane. Moreover, in the worst case this number is 0(a'^). 

Proof. Using Theorem 1, it is enough to show that in each step only finitely many 
new directions are added, and similar triangles generated use already generated 
directions. But we already know these facts from the analysis of the regions: at 
each step only one new direction is added except in regions IV and V where the 
number of directions is (possibly) doubled. Hence, a gross upper bound for the 
worst case is given by 0{a'^). 



5 Classical Results Revisited 

Using only elementary geometric methods it is possible to re-prove classical 
results about the smallest angle and parallel iterative bisection in the bisection 
method. 

Theorem 2. 1. The bisection method gives p-abc A \aABC, where habc is 

the smallest angle in the mesh obtained by iteratively bisecting triangle ABC. 
For triangles below arc C 2 it holds that plabc = cabc- 
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2. For each triangle, no more than 5 bisections (2 steps) are necessary in order 
to diminish the longest side ( called diameter) by one half. For simultaneous 
parallel bisections of all triangles in the mesh, it holds dj < c 2 ~^^^do for 
a small constant c depending on the regions and dj the diameter after j 
(parallel) bisections. 

Proof. 1. First, checking case by case it follows that for triangles in regions below 
C 2 always holds cadc > o'abc and ctobc > o'abc- Second, for triangles in 
region III, the new triangle ADC has aADC > (because aADC = AADC 

and Lemma 1), and clearly gdbc > ctabc- For triangles ABC in region II, 
observe that aABC < 7 t /6 and aADC = AACD > cabc- Finally, once a triangle 
is in region I, we have Figure 5, being the worst case when C = P. 

2. The first sentence is an easy observation, the worst case being triangles in 
region I. 

As for the diameter bound, using formula the area of a triangle A = ^bh 
and the fact that the area decreases exactly by half after a bisection, one gets 
immediately bj = where the sub-indexes indicate sides corresponding to 

a triangle in the j-th (parallel) bisection. 

Now the key point is to observe that: (i) for triangles whose vertex C is 
below arcs C 2 or C 4 the diameter decreases by half after two parallel bisections, 
i.e. d 2 < do/2; and (ii) the fact we already know that, as bisection progresses, 
triangles go “up” the level of arcs C 4 and C2. Hence, hj can be bound (in terms 
of bj) because from the fact mentioned above that we can deduce that Cj is no 
smaller than say tt/ 7. Similarly, ho has a fixed bound in terms of 6 q (the worst 
case being \/3bo/2). Using these formulas we get b'j < c^6g2“^ , for some constant 
c < -\/3 (cf. also [1]). From here, taking square root we get the statement of the 
theorem. 

6 Conclusion 

We presented a taxonomy of triangles in the plane which captures the behavior 
of the bisection method. Besides allowing us to prove complexity results for the 
bisection method, this classification is useful to refine bounds for each class of 
triangles, and to determine more precisely lower bounds on the smallest angle 
Fabc in the mesh, as well as the number of non-similar triangles generated. The 
analysis could be further refined considering regions we did not separate, e.g. 
below arc C 2 , above arc C3 and to the left of MO in Figure 4. Further work 
includes use of this theoretical analysis to refine algorithms of bisection (4-edge 
partition, simple bisection, etc.) according to the type of triangle found in each 
iteration. 
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Abstract. The problem of finding the maxima of a point set plays 
a fundamental role in computational geometry. Based on the idea of 
the certihcates of exclusion, two algorithms are presented to solve 
the maxima problem under the assumption that N points are chosen 
from a d-dimensional hypercube uniformly and each component of a 
point is independent of all other components. The hrst algorithm runs 
in 0{N) expected time and finds the maxima using dN + din A -|- 
^2jYi-i/<i(in _ 1 - expected scalar comparisons. The ex- 

periments show the second algorithm has a better expected running 
time than the first algorithm while a tight upper bound of the ex- 
pected running time is not obtained. A third maxima-finding algo- 
rithm is presented for N points with a d-dimensional component in- 
dependence distribution, which runs in 0{N) expected time and uses 
2dN + 0(ln A(ln(ln A))) -b d^ A^-^/'^Qn Af -b 0{dN^-^^‘^) expected 
scalar comparisons. The substantial reduction of the expected running 
time of all three algorithms, compared with some known linear expected- 
time algorithms, has been attributed to the fact that a better certificate 
of exclusion has been chosen and more non-maximal points have been 
identihed and discarded. 



1 Preliminaries 

The problem of finding all maxima of a point set plays a fundamental role in 
computational geometry since the maxima represent one of the characteristics 
of the point set, and this problem is closely related to convex hull problem. The 
problem occurs in many applications of diverse disciplines such as statistics, 
graphics, data analysis, economics, etc. Basically, a maximum is a point that is 
not dominated by any other point in the same point set. Domination is defined 
as follows by Preparata and Shamos [PS85]: Given a set S' of A points, all 
points belong to the d-dimensional Euclidean space E‘^ with coordinates xi, X 2 , 
Xd- A point Pi dominates a point P 2 if Xi{p 2 ) < Xi{pi) for i = l,2,...,d. We 
refer that the point p\ dominates the point p 2 as p 2 < Pi- A point p in S is a 
maximal element of S if there does not exist any point g in S such that p < q 
and p q. The maxima problem is to find all maximal elements (maxima) of a 
set S with respect to dominance. 
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Kung et al. [KLP75] showed that any algorithm that solves the maxima prob- 
lem in two and three dimensions requires 17 (iV log N) time in the comparison-tree 
model. By using a divide-and-conquer approach, Kung et al. [KLP75] presented 
an algorithm to find all max;ima for a set of N points in E‘^, whose worst-case 
running time is 0{N {log + 0{N log N). This algorithm is referred to as 

the KLP algorithm here. Therefore, in the cases of 2- and 3-dimensional spaces, 
the running time of this algorithm will be bounded by 0(A^log A^). Clearly, the 
KLP algorithm is optimal for the 2- and 3-dimensional spaces. By applying a 
multi-dimensional divide-and-conquer scheme, Bentley [BenSO] gave a simpler 
description of the KLP algorithm. 

When the expected value of any variable, such as running time, is consid- 
ered, the probability distributions of the coordinates of all input points must be 
specified. Under the hypothesis that the d components in each point are inde- 
pendent from continuous distributions, Bentley et al. [BKST78] showed that 
the expected number of maxima of a set of N points in E‘^ is 0((log 
So the expected max;ima are only a small part of the whole point set. The set 
satisfying this hypothesis is referred to having a component independence (Cl) 
distribution by Bentley. Note that points uniform over any rectilinearly oriented 
hyperrectangle exhibit the Cl property. Based on this hypothesis, Bentley et 
al. [BKST78] presented an algorithm to compute the maxima set with a linear 
expected running time 0{N), which is referred to as the BKST algorithm later. 

Let us image what the set of maximal points of a set S looks like from a 
geometrical viewpoint. In the case of the 2-dimensional space, the set of the 
maximal points of S forms a structure, which monotonically decreases in the 
^-coordinates as the x-coordinates of the points increase. This kind of structure 
is called staircase structure. For a static point set S in the 2-dimensional space, 
it has been shown that the staircase structure can be computed in 0{N log N). 
Regarding a dynamic point set S, Overmars and Van Leeuwen [OL81] designed 
a data structure which requires splitting and merging balanced trees when points 
are inserted and deleted. For each insertion and deletion, the required time is 
in 0{N log^ N). Both Fredrickson and Rodger [FR90] and Janardan [Jan91] 
developed a scheme which maintains the staircase structure of a set of maxima 
and allows the insertion in O(logA^), and deletion in time O(log^iV). In 1994, 
Kapoor [Kap94] designed an improved data structure, which maintains the 
staircase structure in 0(log N) time. 

Because the expected number of maxima of a set S of N points is 
0((log for a Cl distribution, most of the points in S are not maxima 

when N is large. This is also demonstrated by the staircase structure of maxima 
in the 2-dimensional space. So it is possible for us to pick up an appropriate 
point and use this point to rule out all points dominated by it, which are not 
maxima of the set S. In 1990, this insight has been pointed out by Bentley et al. 
[BCL90] as follows: “A certificate of exclusion typically can quickly demonstrate 
that most of the N input points are not in the final output”. They studied the 
case that N input points (with Cl property) are distributed uniformly within a 
d-dimensional {d > 2 ) hypercube and presented an algorithm to find the max- 
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ima of these N points in 0{N) expected-time, which is referred to as the BCL 
hypercube algorithm here. Bentley et al. [BCL90] also designed an algorithm 
for finding the maxima of N points chosen from a d-dimensional Cl distribution 
in 0{N) linear expected-time, which is referred to as the BCL algorithm from 
now on. 



2 Improved BCL Hypercube Algorithm-1 

1: ImprovedBCLHypercube-l(S, d) 

2: Al — 0, Bi — 0, Cl — 0,pi —the point of (1 — (InN/N)^^^ , 1 — (InN/N)^^'^); 

3l //Initialize three sets Ai, Si, and Ci to be empty and choose a certificate point pi. 

4: for each point q in the set S, do 
5: if ^ ^ Pi and q ^ pi, Ai = Ai U {g}; 

6l //The set Ai contains all points different from pi that are dominated by pi. 

7: else if Pi < g, Cl — Cl U {g}; 

8l //The set Ci contains all points that dominate pi. 

9: else Si — Si U {g}; 

lOi //The set Si contains all points which are incomparable (with respect to “<”) to pi. 

11: if Cl = 0, compute the maxima of the set S by the KLP algorithm and return; 

12: else do 

13: find the point p that has the maximum of a^i(p) • X 2 {p) ■ x^{p) ^d(p) among all 

points in the set Ci ; 

14: A - 0,S - 0,C - 0; 

15: for each point q in the set Si, do 

16: if 9 ^ P and q ^ p, A = A U {g}; 

17: //The set A contains all points different from p that are dominated by p. 

18: else if p < g, C — C U {g}; 

19: //The set C contains all points that dominate p. 

20: else S — S U {^}; 

21: //The set S contains all points which are incomparable (with respect to “<”) 

//to p. 

22: compute the maxima of the set S by the BKST algorithm and return. 



As shown above, improved BCL hypercube algorithm-1 is based on the BCL 
hypercube algorithm and developed under the assumption that the points of the 
set S (with CI property) are distributed uniformly within a d-dimensional {d > 2) 
hypercube. The basic idea underlying improved BCL hypercube algorithm-1 is 
same as the BCL hypercube algorithm except applying another new certificate 
of exclusion to identify the non-maximal points that are discarded. 

This basic idea can be illustrated in Figure 1 for a 2-dimensional unit square. 
First, a point pi = {1 — {lnN/N)^^‘^^l — {lnN/N)^^‘^) is chosen as the first 
certificate of exclusion for the set S as the BCL hypercube algorithm does. As 
Figure 1(a) shows, by using the point pi, the whole unit hypercube can be 
partitioned into three sets: A,, Bi and Ci, which have the following properties 
respectively. All points in A, are dominated by the point pi while all points in 
Cl dominate the point pi. The point pi is incomparable (with respect to “<”) 
to each point in the set Bi. If the set Ci is empty, the KLP algorithm will be 
applied to compute the maxima of the set S. In case that the set C\ is non-empty, 
because the point pi dominates all points in the set A,, none of these points will 
be a maximal element of the set S and therefore, they can be discarded. So, 
only the points in the set B\ U C\ are left for the continuing computation. The 
underlying idea in improved BCL hypercube algorithm-1 is that a better point 
p chosen from these points in the set C\ can be used as the second certificate of 
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exclusion to identify the non-maximal points in the set B\\J C\. As improved 
BCL hypercube algorithm-1 indicates, this point p is chosen from the set Cx and 

has the maximum value of Xi{p) ■ X 2 {p) ■ x^{p) Xd{p) among all points in the 

set Cl . After the point p is used to make a partition on the set i?i U Ci , three 
subsets A, B and C will be obtained again, as shown in Figure 1(b). Clearly, all 
points in the set A are dominated by the point p and the set C contains only 
one point p. The set B contains all points incomparable to the point p. So all 
points in the set A can be discarded again and only the points in the set i? U C 
need to be computed. Finally, the BKST algorithm is applied to the set BUC 
to compute the maxima set of the set S. The effectiveness of improved BCL 
hypercube algorithm- 1 is determined by the number of points remained after 
the second partition because these points are kept for the future computation of 
maxima by applying the BKST algorithm, although extra cost needs to be paid 
for such a partition. 
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Fig. 1. An illustration of partitioning in improved BCL hypercube algorithm-1 for a 
2-dimensional unit square: (a) the first partition; (b) the second partition. 



The analysis of the expected run time and expected number of comparisons 
of improved BCL algorithm-1 is done as follows. At the beginning, Nd scalar 
comparisons are needed to partition the set S into three parts: A\, B\ and Ci. 
Then two cases need to be considered. As indicated by Bentley et al. [BCL90], 
first, the case that the set Ci is empty occurs with a probability at most 1/fV on 
the average and the expected run time under this case to compute the maxima 
of the set S by applying the KLP algorithm is 0{{logNY~‘^) + 0{logN). Second, 
the case that the set Ci is non-empty occurs with a probability at least 1 — 1/iV. 
Because the set Ci is represented as a hypercube with volume of lniV/7V, the 
expected number of points in the set Ci is In N. So, on the average, {d — 1) In 

multiplications are needed to compute the quantity xi{q) ■ X 2 {q) -x^{q) Xd{q) 

for each point q in the set Ci . To find the maximum of this quantity. In — 1 
expected scalar comparisons are required. Then, the point p is used to partition 
the set Bi U Ci into three subsets: A, B and C and the BKST algorithm is 
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applied to the set B U C. Because the expected number of points in the set 
BiUCi is bounded by n under the case that Ci is non-empty, 

the total extra cost of improved BCL hypercube algorithm-1 will be bounded by 
din l-|-c?-l-(i^iV^“^/‘^(ln The expected run time of applying the BKST 

algorithm to the set BUC is determined by its size. Under the assumption that 
the N points in the set S are distributed uniformly within the hypercube, the size 
of the set i? U C is bounded above by — A^(l — (1/Ai)^/'*)'^. Let a = (l/A^)^/"^ 
and the bound can be reduced as follows: 

N — N{1 — (1/N)^^^)^ — N — N(1 — a)'^ < N(1 — (1 — da)), by Bernoulli’s inequality 
= Nda = dN^~^/'^. 



According to Bentley et al. [BKST78], the expected number of compar- 
isons will be bounded by if the BKST algorithm, which has a 

linear expected run time, is used to find the maxima of the subset B LI C. 
Finally, the expected number of comparisons by applying improved BCL hy- 
percube algorithm- 1 to compute all maxima of a point set S with N points 
under the stated assumption at the beginning will be bounded by dN -|- d In A^ -|- 
d^Af^“^/'^(ln and the expected run time is 0{N). Compared 

with the expected number of comparisons dN + 0{dN^~^/‘^{lnNY^‘^) used by 
the BCL hypercube algorithm claimed by Bentley et al. [BCL90] , improved BCL 
hypercube algorithm- 1 has a better performance on the average if the hidden 
coefficient in the asymptotic notation of the BKST algorithm is large enough. 

3 Improved BCL Hypercube Algorithm-2 

1: ImprovedBCLHypercube(S', d) 

2: compute the maxima of the set returned by PARTITION(S, d) by the BKST algorithm and 
return. 

3: PARTITION(S, d) 

4: if the size of the set S' < 1, return S. 

5l for each point q in the set S, do 
6: compute xi{q) ■ X 2 {q) ■ x^{q) 

7'. find the point p that has the maximum product of xi{p) ■ X 2 (p) ■ x^{p) ^d{p) among all 

points in the set S; 

8; A - 0,Bi - 0,^2 - - 0; 

for each point q in the set S, do 

10: if 9 ^ P and q ^ p, A — AU {g}; 

11: else 

12: if Xi{p) < Xi{q), Bi — BiU {g}; 

13: return {p}uPARTITION(Bi , d)uPARTITION(Bi , d) U U PARTITION(Bd, d). 

As shown above, improved BCL hypercube algorithm-2 is extended from the 
previously presented algorithm and developed under the same assumption. The 
basic idea upon which improved BCL hypercube algorithm-2 is based is as same 
as improved BCL hypercube algorithm- 1 except the approach to identify the 
non-maximal points. Without using the certificate of exclusion only twice, this 
extended algorithm searches the non-maximal points by a recursive procedure 
PARTITION(5, d). This procedure, first, computes the product of xi{q) ‘X 2 {q) * 
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xz{q) Xd{q) for each point q in the set S and picks up the point p, which 

has the maximum product Xi{p) ■ X 2 {p) ■ x^{p) Xd{p), as the certificate of 

exclusion for the set S. Then, it uses the point p to partition the set S into the 
subsets {p},A,Bi,B 2 ,...,Bd- Obviously, all points in the set A are dominated 
by the point p and, therefore, can be discarded. After that, the PARTITION(S', 
d) procedure partitions the sets Bi, B 2 ^ ■■■, Bd recursively to identify more non- 
maximal points and returned with the combined subsets. Finally, improved BCL 
algorithm-2 computes the maxima set of the input set S by applying the BKST 
algorithm to the returned set from the PARTITION(5', d) procedure. 
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Fig. 2. The run times of the BCL hypercube algorithm and its improved versions 
versus the number of points in the set S: (a) in the 4-dimensional hypercube; (b) in 
the 6-dimensional hypercube. 



4 Comparisons between Hypercube Algorithms 

Experiments have been performed for the BCL hypercube algorithm, improved 
BCL hypercube algorithm- 1 and algorithm-2 respectively. The input point set 

5 has a random uniform distribution in a range from 0 to 1 on each coordinate 
that is independent with each other. The random number generator used in the 
implementation is recommended by Press et al. [PTVF93] and L’Ecuyer et al. 
[L’E88]. It combines two random generators to achieve this one. Both of them 
use the same algorithm with the Bays-Durham shuffle and added safeguards but 
with different parameters. The period of the combined generator is 2.3 x 10^® 
that makes period exhaustion practically impossible. Figures 2(a) and 2(b) show 
the curves of the run times versus the set size N for all three algorithms for the 
4- and 6-dimensional hypercubes respectively. These figures indicate that, for 
both of 4- and 6-dimensional hypercubes, improved BCL hypercube algorithm-1 
has a better performance than the BCL hypercube algorithm and improved BCL 
hypercube algorithm-2 has the best performance among all three algorithms for 
a given input point set S', whose size increases step by step. 
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Table 1. The number of points in the sets A, B U C, C and PARTITION(S', d) after 
the partitions in the BCL hypercube algorithm and its improved versions for the 4- 
dimensional hypercube. 
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To explain the difference among the performances of improved BCL hyper- 
cube algorithm-1, algorithm-2 and the BCL hypercube algorithm, the numbers 
of points in the sets A, B A C, C and PARTITION ( S', d), which are obtained 
after the certificates of exclusion are applied in all three algorithms for different 
input point sets for the 4-dimensional hypercube, are presented in Table 1. 

For improved BCL hypercube algorithm-1 and the BCL hypercube algorithm, 
the application of the certificates of exclusion partitions the whole set S into 
the subsets and discards the identified non-maximal points. Finally, only the 
set B U C is computed to obtain the maxima of the set S by applying the 
BKST algorithm. Therefore, compared with the BCL hypercube algorithm, the 
effectiveness of improved BCL hypercube algorithm- 1 will be determined by the 
answer to the question whether the size of the final input set BUC for the BKST 
algorithm is reduced to such an extent that the non-maximal points identified 
by it outweighs the extra cost paid for the second partition. The data in Table 
1 presents a positive answer to this question. For instance, for the point set 
S with 5000000 points in the 4-dimensional hypercube, the set B U C in the 
BCL hypercube algorithm has 128421 points after the first partition while the 
set B U C in improved BCL hypercube algorithm- 1 has 10839 points after the 
second partition. The same scenario happens for the point set S with the other 
sizes. This fact supports the basic idea in improved BCL hypercube algorithm- 
1 that more points can be discarded if a better certificate of exclusion in the 
set C is applied after the first partition. Figure 3(a) shows the ratio between 
the size of the set B A C in improved BCL hypercube algorithm- 1 and that 
in the BCL hypercube algorithm based on the data in Table 1. As this figure 
indicates, the ratio decreases as the number of points in the set S increases. It is 
understandable due to the fact that, on the average, the size of the set i? U C in 
improved BCL hypercube algorithm-1 is bounded by and that of the 

set B AC in the BCL hypercube algorithm is dN^~^^‘^{ln . So their ratio is 

bounded by (InN)”^/''*. Obviously, on the average, when N increases, this ratio 
will decrease. This is consistent with the trend in Figure 3(a). 

Regarding improved BCL hypercube algorithm-2, the application of the cer- 
tificates of exclusion recursively partitions the whole set S into the subsets and 
discards the identified non-maximal points. Therefore, only the set resulted from 
the PARTITION(S', d) procedure is computed by the BKST algorithm and the 
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performance of improved BCL hypercube algorithm-2 is also determined by the 
size of this set plus the extra cost for the recursive partition. From Table 1, it 
can be seen that the PARTITION(5', d) procedure identified most of the non- 
maximal points. For example, for the point set S with 5000000 points, the BKST 
algorithm just needs to compute 847 points returned by the PARTITION(S', d) 
procedure while it needs to compute 128421 points in the BCL hypercube al- 
gorithm. As Figure 2(a) shows, the much smaller run time of improved BCL 
hypercube algorithm-2 than that of the BCL hypercube algorithm also shows 
the extra cost of recursive partition is much less significant than the gain resulted 
from the recursive partition. Figure 3(b) shows the ratio between the sizes of the 
final input set for the BKST algorithm in improved BCL algorithm-2 and that 
in the BCL hypercube algorithm based on the data in Table 1. As this figure 
indicates, the ratio also decreases as the number of points in the set S increases. 
Although a tight upper bound of the size of this set can not be obtained at this 
stage, a rough estimation can be achieved because this size is also bounded by 
dN^~^/'^, which can be drawn from the previous analysis. Therefore, the ratio is 
bounded by (InA^)”^/'^, which indicates the trend in Figure 3(b) although this 
bound is very loose. 
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Fig. 3. The ratio between the size of B U C in improved BCL hypercube algorithm 
and that in the BCL hypercube algorithm for the 4-dimensional unit hypercube: (a) 
improved BCL hypercube algorithm-1; (b) improved BCL hypercube algorithm-2. 



5 Improved BCL Algorithm 

1: ImprovedBCL(S', (i) 

2: Ai — 0, Bi — 0^ Cl — 0,pi —the point of (xi, X 2 , ■■■, Xd), where Xi is A^(ln N/N)^^'^th largest 
element on the tth dimension; 

3l //Initialize three sets Ai, 5i, and Ci to be empty and choose a certificate point pi. 

4: for each point q in the set S, do 
5: if ^ ^ Pi and q ^ pi, Ai = Ai U {g}; 

6i //The set Ai contains all points different from pi that are dominated by pi. 

7: else if Pi < q, Cl — Cl U {g}; 

//The set Ci contains all points that dominate pi. 
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9: else — B\ U {(?}; 

10: //The set B\ contains all points which are incomparable (with respect to “<”) to pi. 

11: if Cl — 0, compute the maxima of the set S by the KLP algorithm and return; 

12: else do 

13: find the point p that has the maximum of Order Statistic{x\{p})) 

Order Statistic{x 2 (p))- 

Order Statistic(x^(p)) Order Statistic{xd{p))\ 

14: / /Order Statistic{xi{p)) is defined as n if Xi{p) is the nth smallest element on the ith 

//dimension for all points in the set Ci. 

15: A = 0,B = 0,C = 0; 

16: for each point q in the set B\ U Ci, do 

17: if 9 5; P and q ^ p^ A = Au {q}; 

18: / /The set A contains all points different from p that are dominated by p. 

19: else if p < q, C — C U {g}; 

20: / /The set C contains all points that dominate p. 

21: else B — B U {(?}; 

22: //The set B contains all points which are incomparable (with respect to “<”) 

//to p; 

23: compute the maxima of the set f? U C by applying the BKST algorithm and return. 



Based on the BCL algorithm, improved BCL algorithm presented above is 
developed under the assumption that the input point set S has the compo- 
nent independent (CI) distribution. The basic idea underlying improved BCL 
algorithm is same as improved BCL hypercube algorithm- 1 except that the cer- 
tificates of exclusion are chosen based on order statistics. First, as the BCL 
algorithm presents, a point pi, whose each coordinate is the 7V(ln Af/Af)^/'^th 
largest element on each dimension, is chosen as the first certificate of exclusion 
for the set S. The order statistic is computed by using Floyd and Rivest’s se- 
lection algorithm et al. [FR75], which selects the Mth largest element in a set 
of N elements with N + min{M, N — M) + expected comparisons. So 

this step requires dN + scalar comparisons. By using the point p\, the 

whole set can be partitioned into three subsets: Ai, Bi and Ci and each has 
their own characteristics that are as same as these stated in the description of 
improved BCL hypercube algorithm- 1. This partition procedure uses dN scalar 
comparisons. Because the probability that any point in the set S dominates 
the point pi is InN/N due to CI property, two cases come up. First, the case 
that the set Ci is empty occurs with a probability at most 1/N on the aver- 
age and the expected run time under this case to compute the maxima of the 
set S by applying the KLP algorithm is 0{{logNY~‘^) + 0{logN) indicated by 
Bentley et al. [BCL90]. Second, the case that the set Ci is non-empty occurs 
with a probability at least 1 — 1/N. In this case, all points in the set Ai are 
discarded and only the points in the set Bi U C\ are considered. Because of 
the CI property, the number of point in the set Ci is also In N on the average. 
Next, the second certificate of exclusion will be chosen based on order statistics. 
The order statistic of Xi{q) for a point q in the set Ci is defined as n if Xi{q) 
is the nth smallest element on the zth dimension for all points in the set Ci. 
This second certificate of exclusion will be the point p that has the maximum 

of Order Statistic{xi{p)) ■ Order Statistic{x 2 {p)) ■ Order Statistic{x^{p)) 

OrderStatistic{xd{p)) among all points in the set C\. Because this process uses 
a Quicksort to get the required point pi, its expected run time is bounded by 
0(lniV(ln(lnN))). After that, the set Bi U Ci is partitioned into three subsets 
A, B and C by applying the point p and this step takes dlnN scalar compar- 
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isons on the average case. Then, only the points in B U C are computed by the 
BKST algorithm. Due to the Cl property, the average case analysis of improved 
BCL algorithm is same as improved BCL hypercube algorithm-1. Thus, we have 
the following conclusion. Improved BCL algorithm finds the maxima of a set 
S with N points with a d-dimensional Cl distribution in 0{N) expected time 
and uses 2dN + 0(ln A^(ln(ln A^))) -|- (P expected 
scalar comparisons. It has a better performance on the average than the BCL 
algorithm with expected scalar comparisons 2dN + 0{dN^~^^‘^{lnNY^‘^), which 
is not dN + 0{dN^~^/‘^{lnNY/'^) stated by Bentley et al. [BCL90]. 
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Fig. 4. (a) The run times of improved BCL algorithm and the BCL algorithm versus 
the number of points in the set S in the 4-dimensional space; (b) the ratio between 
the size of B U C in improved BCL algorithm and that in the BCL algorithm for the 
4-dimensional space. 



Experiments on point sets with Cl distribution (independent and uniformly 
distributed coordinates) have been performed for improved BCL algorithm and 
the BCL algorithm respectively. Figure 4(a), which shows the curves of the run 
times versus the set size N for both algorithms for the 4-dimensional space, in- 
dicates that improved BCL algorithm has a better performance than the BCL 
algorithm. To understand the difference between their performances, the numbers 
of points in the three sets A, B and C, which are obtained after the certificates 
of exclusion are applied in both algorithms for different input point sets for the 
4-dimensional space, are studied. Like BCL hypercube algorithm-1, the effec- 
tiveness of improved BCL algorithm is also determined by whether the second 
partition identifies enough non-maximal points that are worth the extra parti- 
tion cost. Figure 4(b) shows the ratio between the size of B U C in improved 
BCL algorithm and that in the BCL algorithm. As it indicates, the ratio de- 
creases as the number of points in the set S increases. Due to Cl property, this 
happens because, on the average, the size of the set B C in improved BCL 
algorithm is bounded by and that of the set B U C in the BCL algo- 

rithm is dN'^~^ {in NY / . So their ratio is bounded by (lnfV)“^/‘^. Clearly, on 
the average, when N increases, this ratio will decrease. 
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6 Conclusion 

All three presented algorithms which find the maxima of a point set are based 
on the idea of the certificates of exclusion indicated by Bentley et al. [BCL90] . 
Improved BCL hypercube algorithm-1 and improved BCL hypercube algorithm- 
2 are presented to solve this maxima problem under the assumption that all 
points are chosen from a d-dimensional hypercube uniformly (with Cl property) . 
Improved BCL hypercube algorithm-1 runs at a 0{N) expected time and finds 
the maxima using dN + dlnN + d^iV^“^/'^(ln -I- 0{dN^~^^‘^) expected 

scalar comparisons. The experiments^ show improved BCL hypercube algorithm- 
2 has a better expected running time than improved BCL hypercube algorithm- 
1 and the BCL hypercube algorithm. Our current efforts are to obtain a tight 
upper bound on the expected number of points finally computed by the BKST 
algorithm in improved BCL hypercube algorithm-2. Improved BCL algorithm is 
presented for a Cl distribution and has a expected running time 0{N) and uses 
2dN + 0(ln7V(ln(lniV))) -h d^ + OidN^-^/^^) expected scalar 
comparisons. As Table 1, Figures 3(a), 3(b), and 4(b) show, the substantial 
reduction of the expected-time of all three algorithms has been attributed to 
the fact that better certificates of exclusion are chosen and more non-maximal 
points have been identified and discarded for the computation. We conclude 
this paper with the possibility of extending the basic idea, on which improved 
BCL hypercube algorithm-1 and algorithm-2, and improved BCL algorithms are 
based, to similar certificate-based convex-hull algorithms [BCL90]. 
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Abstract. We consider an algorithmic problem that arises in manu- 
facturing applications. The input is a sequence of objects of various 
types. The scheduler is fed the objects in the sequence one by one, 
and is equipped with a finite buffer. The goal of the scheduler/sorter 
is to maximally reduce the number of type transitions. We give the first 
polynomial-time constant approximation algorithm for this problem. We 
prove several lemmas about the combinatorial structure of optimal solu- 
tions that may be useful in future research, and we show that the unified 
algorithm based on the local ratio lemma performs well for a slightly 
larger class of problems than was apparently previously known. 



1 Introduction 

We consider an algorithmic problem that arises in some manufacturing applica- 
tions. The input is a sequence of objects of various types. The scheduler is fed 
the objects in the sequence one by one, and is equipped with a buffer that can 
hold up to k objects. When the scheduler receives a new object, the object is 
initially placed in the sorting buffer. Then the scheduler may eject any objects 
in the sorting buffer in arbitrary order. The sorting buffer may never hold more 
than k objects, and must end up empty. Thus the output from the sorting buffer 
is a permutation of the input objects. Informally, the goal of the scheduler is 
to minimize the number of transitions between objects of different type in the 
output sequence. 

An example situation where this problem arises is the Daimler-Benz car 
plant in Sindelfingen, Germany [2]. Here the objects are cars, and the types are 
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the final color that that particular car should be painted. Particular cars must 
be colored particular colors because of custom orders from customers/dealers. 
For example, a customer can go to http://mbusa.com/ and order a G55 AMG 
Mercedes-Benz SUV with any number of possible options, including the exterior 
paint color. It is reported in [2] that the performance of the final layer painting 
yield mainly depends on the batch size of cars that have to be painted the same 
color, and as a consequence the performance of these sorting buffers can have a 
great impact on the overall performance of the manufacturing. 

For concreteness we will adopt terminology appropriate for the Daimler-Benz 
car plant example, and consider the types of the objects to be colors. The most 
obvious objective function would be to minimize the number of transitions be- 
tween objects of different color in the output sequence. The corresponding maxi- 
mization objective function would be to maximize the number of color transitions 
removed from the sequence. While it may not be completely obvious, it is not 
too difficult to see that in an optimal solution there are no color changes in- 
troduced into the output sequence that were not in the input sequence. Hence, 
one can then see that a solution is optimal for the minimization problem if and 
only if it is optimal for the maximization problem. Of course, the equivalence 
of the maximization and minimization problems does not hold in the context 
of approximation. Whether a minimization or a maximization approximation 
algorithm is most appropriate depends on the input. 

As an example of the problem, and the notation that we adopt, consider the 
following example. The initial sequence is ri( 7 ir 2 (/ 2 ?" 353 ^i?' 4 & 2 ?’ 5 & 3?’6 that contains 
12 objects with 11 color changes. The letters denote colors and the subscripts 
denote different objects of the same color. If fc > 4, then an optimal output 
solution is gi52ff3?’i?’2''’3?’4?’5''’6^i&2^3- It Can be achieved by storing ri, r 2 , and 
in the buffer until the buffer contains these red objects and bi. Then ri,r 2 ,r 3 
can be output, and the buffer can store the blue objects until the end. This gives 
a value of 2 color changes for the minimization objective function, and a value 
of 9 color savings for the maximization objection function. 

It is not known if this sorting buffers problem is AP-hard. It is not hard to see 
that there is an 0(n^“'"^)-time dynamic programming algorithm for this problem. 
It is also not hard to see that there is an ^)-time dynamic programming 

algorithm for this problem, where c is the number of different colors. So if k or 
c is 0(1), then the problem can be solved exactly in polynomial time. It seems 
that there may be real-world applications where the number of colors is not too 
large. The best approximation result known for the minimization approximation, 
an approximation ratio of 0(log^ fc), is obtained by the polynomial-time online 
algorithm Bounded Waste introduced in [6] . 

A related problem. Paint Blocking, is studied in [7]. In this case the input 
sequence is reordered twice using two different buffers of the same size. After 
the first reordering the number of transitions between different types of objects 
is counted, and the second reordering has to return the sequence to its original 
order. For the minimization problem, a 5-approximation algorithm is given in [7]. 
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Our main algorithmic result is a polynomial-time ^-approximation algo- 
rithm for the maximization problem. Thus, this is the first constant approxima- 
tion for either the maximization or minimization version of the sorting buffers 
problem. In order to obtain this result we have to prove several combinatorial 
lemmas about the structure of optimal, and near optimal, solutions. We expect 
that lemmas will be useful in further investigations into this problem. In the 
process of developing our algorithm, we showed that the analysis of the unified 
algorithm in [1] can be generalized to a slightly larger class of problems than indi- 
cated in [1]. Essentially our formulation allows arbitrary pairwise restrictions on 
membership in the solution, while in [1] only transitive restrictions are allowed. 
It is certainly plausible that this generalization might be useful elsewhere. 

2 Observations about the Optimal Schedule 

For any algorithm ALG and any input sequence a, we let ALG((j) denote both 
the resulting schedule and the color savings created by this schedule. Let OPT(ct) 
denote both an optimal schedule on input cr and the color savings in this solution. 
We say that the algorithm ALG is a c-approximation algorithm, if for any input 
sequence ct, ALG(cr) > c • OPT(cr). A schedule is lazy if a color change is never 
created in the output sequence if there is another legal move that doesn’t create 
a color change. As noted in [6], one can always change any algorithm into a lazy 
algorithm without any loss of performance. The following observations are then 
almost immediate consequences of this observation. 

Lemma 1. Consider an arbitrary input sequence. If two objects of the same 
color are adjacent in the input sequence, then there is an optimal schedule where 
these two objects are adjacent in the output sequence. 

Lemma 2. For any optimal algorithm and any input sequence, we may assume 
that for any color, any two objects of this color have the same order in the input 
sequence and the output sequence. 

One consequence of Lemma 1 is that no color changes are created in the 
output sequence. Thus it makes sense to talk purely about the reduction of 
color savings in the optimal without needing to consider the possibility of color 
changes created in the optimal. This allows us to formally define our problem 
in a slightly non-intuitive manner, in which the input is a sequence of groups of 
objects with the same color. The fact that this problem statement is equivalent 
to the problem statement in the introduction essentially follows from the fact 
that we can restrict ourselves to lazy schedules. 

Definition 1. The Sorting Buffers Maximization Problem (SBMP) is defined 
as follows: 

— The input sequence cr consists of a sequence of groups. Each group has a 
color and a size, i.e., the number of items it contains. No two consecutive 
groups have the same color. 
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~ The output sequence is a permutation of the groups in the input sequence. 

— The buffer B can contain a collection of groups with size at most k — 1. 
Initially the buffer is empty. Note that the k — 1 hound insures that there is 
one space in the buffer left to move objects through. 

— Any algorithm for SBMP essentially repeats the following two steps until the 
entire input sequence has been read, and the buffer is empty. 

1. The next group in the input sequence is either appended to the output 
sequence or is put in the buffer (if space permits). 

2. Optionally one or more groups from the buffer are moved to the output 
sequence. 

— When an algorithm ALG is run on an input sequence a it gains a color 
savings of one each time the last group in the output sequence and the next 
group to be put there are the same color. 

In SBMP, a lazy schedule is now defined as follows: if the last group in 
the output sequence is of a color c, then first all groups of objects of color c 
in the buffer are output in first-in-first-out order, and then if the next group 
of objects in the input sequence has color c then this group is immediately 
output without being put into the buffer. As before, one can always change any 
algorithm/ schedule into a lazy algorithm/schedule without loss of performance, 
and thus, we only consider lazy schedules. We now switch notation and use 
to denote the ith group with color r. 

We classify color savings between two groups, say and in a schedule 
in one of the following three ways: 



— Move out-saving (MOS): In this case, r^ is placed in the output sequence 
before r^+i is reached, and further all the groups, that are between Vi and 
Ti+i in the input sequence, appear after ri+i in the output sequence. Hence, 
all the items between Vi and in the input sequence were in the cache 
when was output. The out groups for this MOS, are defined to be the 
groups between r^ and r^^i in the input. An example of a MOS is (here the 
block label A is some arbitrary sequence of groups) : 



ri 



A 



»+i 



nri+i 



A 



Note that for a MOS it is not necessary that be put in the cache, and thus 
we will assume that it is not in the optimal solution. 

— Move backward-saving (MBS): In this case is put into the buffer, and is 
not expelled before group is reached. An example of a MBS is: 



ri 



A 



r2 



A 



riV2 



— Move backward and out-saving (MBOS): In this case, Vi placed in the output 
sequence before Vi^i is reached, and not all the groups, that are between 
and Vi+i in the input sequence, appear after in the output sequence. 
An example of a MBOS is: 



A 


B 


Ti+l — >■ 


A 


nvi+i 


B 
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This is a combination of the previous two. At first, the is put in the 
buffer, and moved backward. Before is reached, is dropped in the 
output sequence, and the groups in B are moved out. 

At the time that is placed in the output, let gj be the next group in the 
input sequence. Then the drop point is defined to be the point immediately 
before gj in the input sequence. The out groups for this MBOS are defined 
to be the groups between the drop point of r* and in the input. 

In all three cases, we say that is the first group of the savings and r^+i is 
the last group. We now give an illustrative example of these definitions in the 
following figure: 



A 


r2 


B 


rs 


C 


D 


r4 


E 




F 


re 


A 


B 


C 


rir2r3r4rbrQ 


D 


E 


E 



The first two color savings, which correspond to the pairs ri — r 2 and r 2 — r^, 
are MBS’s. This is because ri is moved backward to V 2 , then both of them are 
moved backward to r^. After this the ri — r 2 — r^ group is dropped in the output 
sequence. The groups in D are the out groups for the r^ — r 4 MBOS. Thus, the 
color savings corresponding to the rs — r^ pair is a MBOS. The drop point for 
C 3 is between the last group in C and the first group in D. The last two color 
changes are MOS’s. 

Lemma 3. Let ri and ri+i he any two groups between which there is a MOS, 
and let A he the groups between r^ and r^+i in the input sequence. Then: 

— No group in A is part of a MOS nor is it the last group in a MBOS. 

— No group before in the input sequence is the first group in a MBOS with 
drop point between and r^+i. 

Proof. Each of these possibilities involve placing a group in the output sequence 
between and r^+i, which contradicts r* — r^+i being a MOS. □ 

Lemma 4. Let ri and Tj+i he any two groups between which there is a MBOS. 
Let A be the groups between r^ and the drop point for ri in the input sequence. 
Let B he the groups between the drop point for and Tj+i in the input sequence. 
Then: 

— No group in B is part of a MOS, nor is it the last group in a MBOS. 

— No group before r^+i in the input sequence is the first group in a MBOS with 
drop point between the drop point of ri and ri+\. 

Proof. Similar to Lemma 3. □ 

Lemma 5. For any two distinct MOS’s or MBOS’s, the out groups do not over- 
lap. 

Proof. For ease of reference, we denote the point in the input sequence succeeding 
the first group of a MOS the drop point of this savings. Let ri — r^+i and bj — bj+i 
be any two distinct pairs of groups between which there are a MOS or MBOS. 
Without loss of generality we assume that the drop point of r* — occurs 
before the drop point of bj — bj+i. Then, by Lemmas 3 and 4 the drop point of 
bj — bj+i is at the earliest after ri+i in the input sequence. Thus, the out groups 
of the two savings do not overlap. 
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3 Reduction to Two Problems 

We show that either an 12(1) fraction of the savings in the optimal solution are 
MOS, or an 12(1) fraction of the savings in the optimal solution are MBS. We 
then show how to efficiently construct a solution that is constant competitive 
with respect to the number of MOS color savings, and show how to efficiently 
construct a solution that is constant competitive with respect to the number of 
MBS color savings. By taking the better of these two solutions, we get a constant 
approximation algorithm. 

We first note that we can disregard the MBOS. 

Lemma 6. For any input sequence a, there exists a solution for which the total 
number of MBS’s and MOS’s equals at least half of the profit of an optimal 
solution. 

Proof. Let cr be any fixed input sequence, and let OPT(cr) be any fixed optimal 
schedule for a. We gradually transform this schedule into a new schedule with 
the desired properties. 

We consider all MBOS of OPT(cr) one by one in order of their first group 
starting from the end of input sequence and continuing to the beginning (in the 
opposite of the usual order). During this sweep we maintain the invariant that 
any group further towards the end of the input sequence which is the first group 
in a MBOS in the original optimal schedule either has been turned into the first 
group of a MBS or it has a unique associated MBS in the schedule. Furthermore, 
we do not change any MBS or MOS in the original schedule. As shown below we 
also ensure that no two MBOS’s share the same associated MBS. Consequently 
the resulting schedule has at least half the profit of the optimal solution. 

Let ri — Ti+i be a MBOS under consideration. Let A be the groups in cr 
between and rfs drop point, and B the groups in a between rfs drop point 
and r^+i. That is this part of the input looks like, • • • riABri.^.i • • • . Note that 
by Lemma 4 and 5, no group in B participates in a MOS, neither are they the 
last group of a MBOS. 

First, if any group in B is the first group of a MBS, then we associate one 
of the corresponding MBS’s with the rj-ri+i MBOS. Note, as a direct result of 
Lemma 5 no other MBOS is associated with this MBS. 

If this is not the case, we instead transform the MBOS into a MBS. The 
transformation is by induction on the number of groups left in B. The base case 
is when no groups in B are left. In this case we have turned this MBOS into a 
MBS, since r* is kept in the buffer until r^+i is reached in the input sequence. 
For the induction step, let pj be the group in B furthest to the beginning of the 
input sequence. If pj is not the first group in a savings, then the same profit is 
gained, if pj is before in the output sequence. Consequently instead of placing 
ri in the output sequence, just before pj is met in the input sequence, we keep 
ri in the buffer, output pj directly, and only then we place in the output 
sequence. Otherwise, if pj is the first group of a savings, it must be a MBOS. In 
this case, we place pj into the output sequence as soon as pj is encountered, and 
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then immediately afterwards place in the output buffer. This may reduce the 
total profit gained by the solution by one, as the gj—gj^i MBOS is deleted. But 
due to the invariant, this savings already has an associated MBS in the solution 
which pays for the deletion of this MBOS. □ 



Definition 2. Let the Reduced Sorting Bujfers Maximization Problem (SBMP- 
R) he defined as the Sorting Bujfers Maximization Problem (SBMP) (Defini- 
tion 1 ), except that no group can participate in more than one color savings, and 
that profit is only gained for savings of type MBS or MOS. 



Note that in SBMP each group can participate in up to two color savings, 
one in front and one in back. As an example of this, look at the following input 
and output sequence: 



A 


r2 


B 




C 


ri 


A 


B 


C 






A total of four groups are involved in the color savings. For SBMP, a total color 
savings of three is gained, whereas for SBMP-R, only a total colors savings of 
two is gained (the two blocks are ri — r 2 and r^ — r 4 ). For SBMP-R, another 
solution gives rise to the same profit: 



A 


r2 


B 


rs 


C 


ri -)> 


A 


riV2 


B 


C 



By only looking at SBMP-R instead of SBMP, we loose an approximation 
factor of j. 

Lemma 7. Let a be any input sequence for SBMP and SBMP-R. Let OPT{a) 
be an optimal solution for SBMP, and let OPTpfia) he an optimal algorithm for 
SBMP-R. Then OPTp^{a) > \ OPT{a). 

Proof. By Lemma 6 there exists a schedule for which the total number of MOS’s 
and MBS’s is at least half of OPT(cr). Let S be any such schedule, and divide 
the resulting output sequence of S into maximal runs of groups of the same 
color between which there is a MOS or MBS. So runs are either broken by a 
color change or a MBOS. For any such run with i > 2 groups, S has a profit of 
i — 1. This run can be divided into [i/2j disjoint pairs of color savings. Thus, for 
OPT PI the profit is at least \i/2\ > that is, at least one half of the profit 
gained by S on MOS’s and MBS’s. By Lemma 6 this is at least OPT((t)/4. □ 



Let SBMP-R-MOS be the problem of maximizing the number of color savings 
of type MOS where each group participates in at most one savings. Similarly, 
let SBMP-R-MBS be the problem of maximizing the number of color savings of 
type MBS where each group participates in at most one savings. In section 4, we 
give a polynomial time algorithm solving SBMP-R-MOS. In section 5, we give a 
polynomial time algorithm with an approximation factor of at least | for SBMP- 
R-MBS. Either the optimal solution for SBMP-R-MOS is a | approximation of 
the optimal solution for SBMP-R, or the optimal solution for SBMP-R-MBS is 
a I approximation of the optimal solution for SBMP-R. Hence, the better of our 
solutions in section 4 and section 5 is a | approximation of OPT jp. Consequently 
we have an ^ approximation of SBMP. 
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Theorem 1. There is an algorithm with an approximation factor of at least ^ 
for SBMP running in polynomial time. 

4 A SBMP-R-MOS Algorithm 

The next piece of the puzzle is a greedy algorithm solving SBMP-R-MOS exactly. 
Recall this problem is to maximize the number of MOS. 

As before, we may assume that the ordering for groups of the same color 
does not change. Consequently it is sufficient to consider only MOS between two 
groups ri and ri+\ of the same color, where no group of this color occurs between 
the two in the input sequence. Further, the total size of the groups in between 
ri and r^+i has to be at most A: — 1, or else they cannot be in the buffer. Again 
one can show, as in Lemma 3, that we may assume that no pair of MOS occur 
inside one another. 

Instead of solving SBMP-R-MOS directly, we transform it into the problem 
of finding a maximum independent set in an interval graph. That is, the input 
is a sequence of intervals over the real line, and the desired output is a maximal 
cardinality disjoint set of intervals. If one orders the intervals by increasing right 
endpoint, and greedily selects intervals, it is a standard exercise to show that 
this greedy algorithm exactly solves the problem [3,4]. 

We now construct our reduction. For each possible MOS in the input se- 
quence, we create a corresponding interval starting at the first group and ending 
at the last group of the MOS (inclusive). Then two intervals overlap, if and only 
if they cannot occur together. This is the case if they either share a group or 
if they would occur inside each other, if they were both used. Then the maxi- 
mum MOS corresponds exactly to the maximal independent set in the resulting 
interval graph. 

5 A SBMP-R-MBS Approximation Algorithm 

As part of our approximation algorithm for SBMP-R-MBS we need a general- 
ization of the unified algorithm introduced in [1]. First, we explain the general- 
ization, and then we apply this result on our own problem by using a reduction 
from SBMP-R-MBS to The Resource Allocation Maximization Problem. 

Definition 3 (RAMP). The Resource Allocation Maximization Problem is de- 
fined as follows: 

— The input consists of a number of instances I, each requiring the utilization 
of some limited resource. The amount of resource available is fixed over time, 
and its size is normalized to one. 

— Instances I are each defined by the following four properties: 

• A half-open time interval [s(/),e(/)) during which the instance is to be 
executed. s{I) and e{I) are the start-time and end-time of the instance. 

• The amount of resource necessary or the width of the instance, w{I), 
0 < w{I) < 1. 
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• The profit p{I) > 0 gained hy using this instance 

• The activity A{I) which is a set of instances (always including I) which 
cannot occur at the same time as I. For any two instances I and J , then 
I G A{J) if and only if J € A{I). 

— The output or a feasible schedule is a set of instances X, such that for each 
instance I in X the set A{I)C\X only contains I, and for each time instance 
t the total width of all instances in X which contains t is at most one. The 
total profit of X is the sum of the profit of the individual instances in X. 

— The goal of RAMP is to find a feasible schedule with maximal profit. 



Note that the only way our problem differs from the original problem [1] is 
the way activities are defined. In our setting each instance I has its own set of 
other instances A(I) which cannot occur at the same time as I. As an example, 
in our problem it is possible to have three instances, I\, I 2 , and I 3 , such that 
Ii £ A(l 2 ) and I 2 G A{Iz), but I\ ^ A{I^). In [I] this cannot be the case, since 
activities are transitive, i.e., Ii G A{l 2 ) and I 2 G A{I^) implies I\ G A{Iz). 

By following the analysis in [1], one can verify that it still holds in this more 
general setting. The reader is referred to [5] for a more detailed analysis. The 
main result about the obtained approximation ratio is the same as in [ 1 ]: 

Lemma 8 (Lemma 3.2 from [1]). Let (w„axy) be the minimum (maxi- 
mum) width of any instance in the input, and let a be some value larger than 
zero. The approximation factor of the unified algorithm is at least 

min{l,Q;-max{w,am,I-w,„ax}} 

1 + q; 



As noted in [ 1 ] the unified algorithm can easily be implemented in polynomial 
time. This result also applies for our slightly modified unified algorithm. 

We now use this result to make an approximation algorithm for our original 
problem, SBMP-R-MBS, by reducing it to the generalized problem. 

First note, that we may not, as in SBMP, assume that the ordering for groups 
of the same color does not change. This can be seen in the following example: 



2?’4 



A B \r 2 rf C 6i62?'i?'4 



where ri contains one item, and all other groups each contain k — 2 items. If we 
want to move r^ to r 4 , then we cannot move 61 to 62 . Thus, the only feasible 
solution with a profit of three is the one given above. 

Further note, that as only MBS’s are allowed, we may assume that the first 
group of a MBS is moved no farther backward than to the last group: moving 
the two groups farther back in the sequence does not give rise to more profit. 

Also note that we can see the input sequence as a time line. Then a MBS 
starts/ends at the time corresponding to its first /last group (inclusive). 

With the above in mind, we now construct the reduction. For each color r 
and for each pair of groups of this color and Vj where occurs before rj in the 
input sequence and where the size of is at most A: — 1, we create an instance 
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Inrj corresponding to the possible ri — rj MBS. The width of Inrj is the number 
of items in the first group normalized by A: — 1, i.e., ) = size(ri)/(fc — 1). 

As all MBS’s have a profit of one, p{Ir^rj) = 1- Further, the activity A{Ir^rj) 
contains I^rj as well as those instances that use either or rj, i.e., exactly the 
instances which cannot occur at the same time as Inr^ ■ 

Lemma 9. There is an algorithm running in polynomial time with an approxi- 
mation factor of at least | for SBMP-R-MBS. 

Proof. Suppose all instances have a width of at most i.e., iCmax < 5- In this 
case a = 2 maximizes Lemma 8 with a performance guarantee of ^ . 

Next, suppose all instances have a width of at least i.e., In 

this case no pair of intersecting instances may both be used. Consequently this 
problem is equivalent to the problem of finding the maximal independent set in 
an interval graph. Similar to Section 4, this can be solved exactly by a simple 
greedy algorithm. 

In the general case we solve the problem separately for the instances of width 
at most a half and for the instances of width more than a half. Either the optimal 
solution for the former case is at least | of the optimum, or the optimal solution 
for the latter case is at least ^ of the optimum. Hence, the better of the two is 
at least a | approximation of SBMP-R-MBS. □ 
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Abstract. In paper we develop an easily applicable algorithmic tech- 
nique/tool for developing approximation schemes for certain types of 
combinatorial optimization problems. Special cases that are covered by 
our result show up in many places in the literature. For every such spe- 
cial case, a particular rounding trick has been implemented in a slightly 
different way, with slightly different arguments, and with slightly differ- 
ent worst case estimations. Usually, the rounding procedure depended on 
certain upper or lower bounds on the optimal objective value that have 
to be justified in a separate argument. Our easily applied result unifies 
many of these results, and sometimes it even leads to a simpler proof. 
We demonstrate how our result can be easily applied to a broad family 
of combinatorial optimization problems. As a special case, we derive the 
existence of an FPTAS for the scheduling problem of minimizing the 
weighted number of late jobs under release dates and preemption on 
a single machine. The approximability status of this problem has been 
open for some time. 



1 Introduction 

One of the commonly stated goals of algorithmic research is the development of 
a modestly-sized toolkit of widely applicable algorithmic techniques. The vision 
is that future researchers, particularly those without specialized training in al- 
gorithmics, could use these tools to quickly develop/analyze algorithms for new 
problems. In this paper, we develop an easily and widely applicable algorithmic 
technique/tool for developing approximation schemes for certain types of com- 
binatorial optimization problems. This tool should save algorithmic researchers 
time, and is simple enough to be used by researchers without specialized algo- 
rithmics training. 

Over the years, there have evolved a number of standard approaches for de- 
signing approximation schemes; see for instance Horowitz & Sahni [8,9], Ibarra 
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& Kim [10], Sahni [18], and Woeginger [20]. (A review of basic definitions related 
to approximation schemes can be found in section 2.) We will investigate one 
of these standard approaches and demonstrate that it applies to a broad family 
of combinatorial optimization problems. The standard approach under investi- 
gation is the technique of rounding the input; this technique goes back to the 
1970s and possibly has first been used in the paper by Horowitz & Sahni [8]. The 
family of combinatorial optimization problems under investigation is defined as 
follows. 

Definition 1. (Subset selection problems) 

A subset selection problem V is a combinatorial optimization problem whose 
instances I = (A, w, S) consist of 

— a ground set X with \X\ = n elements; 

— a positive integer weight w(x) for every x € X; 

— a structure S that is described by £{S) bits; 

The structure S specifies for every subset Y C X whether Y is feasible or infea- 
sible; this can be done within a time complexity polynomially bounded in n and 
£{S). 

If V is a minimization problem, then the goal is to find a feasible subset 
Y C X that minimizes w{Y) = ® max;imization problem, 

then the goal is to find a feasible subset Y C X that maximizes w(Y). □ 

The class of subset selection problems described in Definition 1 is very gen- 
eral, and it contains many problems with very bad approximability behavior. 
For instance, the weighted independent set problem (“Given a graph with vertex 
weights, find the max;imum weight subset of pairwise non-adjacent vertices”) 
belongs to this class. It is known that weighted independent set does not possess 
any p-approximation algorithm with a fixed p > 1, unless P=NP (Hastad [7]). 
If we additionally impose condition (C) as in the following theorem, then the 
approximability behavior of subset selection problems improves considerably. 

Theorem 1. Let V be a subset selection problem with instances I = (A, w, S) 
that satisfies the following condition: 

(C) There exists an algorithm that solves V to optimality whose running time 
is polynomially bounded in n, in W := 

Then problem V has an FPTAS. 

Theorem 1 is proved in Section 3. The proof is quite straightforward, and 
it mainly uses the folklore rounding tricks from the literature. The main con- 
tribution of this paper is to identify the neat and simple condition (C) that 
automatically implies the existence of an FPTAS. Special cases that are covered 
by Theorem 1 show up at many places in the literature. For every such special 
case, the rounding trick has been implemented in a slightly different way, with 
slightly different arguments, and with slightly different worst case estimations. 
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Usually, the rounding procedure depends on certain upper or lower bounds on 
the optimal objective value that have to be justified in a separate argument. 
Theorem 1 unifies many of these results, and sometimes it even leads to simpler 
proofs. 

Sections 4 and 5 contain a number of optimization problems (from schedul- 
ing theory and from graph theory) that fit into the framework of Definition 1 . 
These examples illustrate the wide applicability and ease of use of our re- 
sult. As one special case, we prove in Theorem 2 that the scheduling problem 

1 \pmtn,rj \ '^WjUj (the problem of minimizing the weighted number of late 
jobs under release dates and preemption on a single machine) has a PTAS. The 
approximability status of this problem has been open for some time, and the 
question was considered to be difficult. In particular, the problem does not fit 
into the framework for FPTAS’s established by Woeginger [20]. 

2 Basic Definitions 



An algorithm that returns near-optimal solutions is called an approximation al- 
gorithm] if it does this in polynomial time, then it is called a polynomial time ap- 
proximation algorithm. An approximation algorithm is called a p-approximation 
algorithm, if it always returns a near-optimal solution with cost at most a factor 
p above the optimal cost (for minimization problems) respectively at most a 
factor p below the optimal cost (for maximization problems). The value p > 1 
is called the worst-case performance guarantee of this algorithm. A family of 
(1 -b £)-approximation algorithms over all real £ > 0 with polynomial running 
times is called a polynomial time approximation scheme or PTAS, for short. If 
the time complexity of a PTAS is also polynomially bounded in l/£, then it 
is called a fully polynomial time approximation scheme or FPTAS, for short. 
With respect to relative performance guarantees, an FPTAS is essentially the 
strongest possible polynomial time approximation result that we can derive for 
an NP-hard problem (unless P=NP holds). 



3 Proof of the Main Result 



In this section we will prove the main result of the paper. The proof method 
is essentially due to Horowitz & Sahni [8]. The arguments for minimization 
problems and for maximization problems are slightly different. We start with 
the discussion of minimization problems. 

Let £ > 0 be a small real number. Let I = {X, w, S) be some instance of a 
minimization problem V that belongs to the class of subset selection problems 
as defined in Definition 1. Let x\, . . . ,Xn be an enumeration of the elements of 
the ground set X such that 



w(xi) < w{x2) < ■■■ < w{xn)- 



( 1 ) 
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For fc = 1, . . . , n, we define a so-called scaling parameter 

= e---w{xk). (2) 

n 

We introduce a number of new instances . . . , of problem V. Every new 
instance has the same structure S and the same ground set X as I, but it 
has a different set of weights As a consequence, all instances have the 
same feasible solutions as the original instance I. The weights are defined 
as follows: 

— For i = 1, . . . , /c, we set = \w{xi) / . 

— For i = A: -I- 1, . . . , n, we set w^^^Xi) = n\n/e~\ . 

The definition of the parameter in (2) yields that w^^\xi) < \n/e\ for 
1 <i <k. Therefore, the overall weight of all elements in instance can 
be bounded as 



k n 

ly(fc) < _l_ ^ n|"n/e] < n‘^\n/e~\. (3) 

■i— 1 z— fc+1 

Hence, is polynomially bounded in n and in 1/e. If we feed instance 
to the exact algorithm in condition (C) in Theorem 1, then the running time 
is polynomially bounded in n, in £{S), and in 1/e. That’s precisely the type of 
time complexity that we need for an FPTAS. Hence we get Lemma 1. 

Lemma 1. Every instance can he solved to optimality within a time com- 
plexity polynomially hounded in n, i{S), and 1/e. □ 

Next, let Y* denote the optimal solution for the original instance I, and let 
Opt denote its optimal objective value w{Y*). Let Y^^'> C X denote the optimal 
solution for instance for k = 1, . . . , n. Let j denote the maximal index with 
Xj G Y*. Obviously, 

Opt = w(Y*) > w{xj). (4) 

Furthermore, we claim that 

Y^^'> C {xi,X2,...,Xj}. (5) 

This statement is vacuously true for j = n. For j < n — 1, we use that Y* is some 
feasible solution for instance whereas Y^^'> is the optimal feasible solution 
for instance Since |y*| < / < n — 1, this yields 

y;(i)(yO-)) < < \Y*\-w^^\xj) < (n-l)-[n/£l. (6) 

By Inequality (6), the set Y^^'> can not contain any of the expensive elements 
Xj+i, . . . ,Xn that all have weight n\n/e\ . This proves (5). 

We now analyze the quality of the feasible solution Y^dl for the original 
instance I. In the following chain of inequalities, the first inequality holds since 
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(5) implies w{y) < ■ for all y € The second inequality holds, 

since is the optimal solution for weights The equation in the third 

line follows from (5). The inequality in the fourth line follows from \a\ < a + 1. 
The inequality in the sixth line follows from |1"*| < n and from (2). The final 
inequality follows from (4). 

I y G ■ X! { w^^\y) I y g } 

< z^^'> ■ Y, { I y g f* } 

= ^ { \w{y)/Z(^^^ I y G F* } 

< Z^^'> ■ Y { w{y)/Z^^'> + 1 I y G F* } 

= ^ {w{y) lyGF*} + |F*|-F(^) 

< Opt + n - e ■ — ■ wixA 

n 

<(! + £)• Opt. 

With this, it is clear how to get the FPTAS: We compute the optimal solutions 
F(^) C X for the instances with k = 1, . . . , n. By Lemma 1, this can be 
done with time complexity polynomially bounded in the length of the encoding 
of / and in 1/er. Then we compute the costs of Y^^'> with respect to instance I, 
and we determine the best solution. By the above chain of inequalities, this best 
solution has objective value at most (1 + e)OPT. This completes the proof of 
Theorem 1 for the case where P is a minimization problem. 

Now let us discuss the case where V is & maximization problem. Consider an 
instance I = {X, w, S) of P, and enumerate the elements of the ground set X as 
in (1). In a preprocessing phase, we determine for every element Xk (k = 1, . . . ,n) 
whether there exists a feasible solution that contains xt- This can be done as 
follows: We create a new instance from / by setting the weight of element 
Xfe to n and by setting the weights of the remaining n — 1 elements to 1. Clearly, 
Xfe shows up in some feasible solution if and only if the optimal objective value 
of is greater or equal to n. Since the overall weight in instance is 2n— 1, 
the algorithm from condition (C) can be used to solve it in time polynomially 
bounded in n and (.{S). 

The main part of our algorithm is built around the maximal index j for which 
element xj occurs in some feasible solution. This implies 

Opt > w{xj). (7) 

We introduce a scaling parameter Z"^ = e- --w{xj). We define a new instance 
from I that has new weights . For i = 1, . . . , j we set w"^{xi) = \_w{xi)/Z'^\, 
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and for z = j + 1, . . . , n we set w"^{xi) = 1. Similarly as in the minimization case, 
instance can be solved to optimality within a time complexity polynomially 
bounded in n, ^(S'), and 1/e. 

The optimal solution of satisfies the following inequalities. These 
inequalities run in parallel to the inequalities for the minimization case. In the 
third line, we use (7) to bound w{xj). 



E{ w{y) \y&Y*}>Z* -Y, { w*{y) \y&Y*} 

>Z*-Y {w*{y) \ y€Y*} 

= z*-Y { My)/z*\ \y&Y*} 

>z*-Y { w{y)/Z* - 1 I y G r* } 

= ^ {zc(y) Ij/GF* } - |F*|-Z# 

> Opt — n • e • — • w(xi) 
n 

>(!-£)• Opt. 

Hence, also maximization problems have an FPTAS. The proof of Theorem 1 is 
complete. 

4 Example: Scheduling to Minimize the Weighted 
Number of Late Jobs 

In this section, we will use the standard three-field scheduling notation (see e.g. 
Graham, Lawler, Lenstra & Rinnooy Kan [51 and Lawler, Lenstra, Rinnooy Kan 
& Shmoys [14]). 

In the scheduling problem 1 1| X) the input consists of n jobs Jj with 

positive integer processing times Pj, weights Wj, and due dates dj {j = 1, . . . ,n). 
All jobs are available for processing at time 0. In some schedule a job is on-time 
if its processing is completed by its deadline, and otherwise it is late. The goal 
is to schedule the jobs without interruption on a single machine such that the 
total weight of the late jobs is minimized. The problem 1 j j X known to 

be NP-hard in the ordinary sense (Karp [11]). 

Problem 1 1 | X belongs to the class of subset selection problems as 
described in Definition 1. The ground set X consists of the n jobs with weights 
Wk and total weight W = XILi "bhe structure S consists of the processing 
times Pj and the due dates dj {j = 1, . . . , n). A subset F of the jobs is feasible, 
if the remaining jobs in A — F can all be scheduled on-time on a single machine; 
clearly, this information is specified by the structure S. Lawler & Moore [15] give 
a dynamic programming formulation that solves 1 j j '^WjUj in 0{nW) time. 




Approximation Schemes for a Class of Subset Selection Problems 



209 



Then our main result in Theorem 1 implies the following well-known result of 
Gens & Levner [4]. 

Corollary 1. (Gens & Levner [4], 1981) 

There exists an FPTAS for minimizing the weighted number of late jobs in the 
scheduling problem 1\ \ ^WjUj. □ 

A closely related problem is to maximize the total weight of the on-time 
jobs. Clearly, the algorithm of Lawler & Moore [15] also solves this maximization 
problem in 0{nW) time. We get the following result. 

Corollary 2. (Sahni [18], 1976) 

There exists an FPTAS for maximizing the weighted number of on-time jobs in 
the scheduling problem 1 1 | ^WjUj. □ 

In the 0/Tknapsack problem, the input consists of n pairs of positive integers 
{wk,bk) and a positive integer b: The weight Wk denotes the profit of the kth 
item, and bk denotes the space occupied by this item. The goal is to select a 
subset Y that has the maximum profit subject to the condition that it does not 
occupy more than b space. The 0/Tknapsack problem is NP-hard (Karp [11]), 
and it can be solved in 0{nW) time (see for instance Bellman & Dreyfus [1] or 
Martello & Toth [17]). 

It is easy to see that the 0/ 1-knapsack problem belongs to the class of subset 
selection problems of Definition 1. In fact, it is a special case of the maximization 
version of the scheduling problem 1 | | ^WjUj as described above: Essentially, 
the fcth item corresponds to a job with processing time bk, weight Wk, and 
(universal) due date b. 

Corollary 3. (Ibarra & Kim [10], 1975) 

The 0/Tknapsack problem possesses an FPTAS. □ 

Another closely related problem is 1 | pmtn, rj \ Y) WjUj: There are n jobs Jj 
{j = 1, . . . ,n) with processing times Pj, weights Wj, due dates dj, and release 
dates Vj. In this variant, job Jj cannot be started before its release date rj, but 
it may be preempted. Lawler [13] designs a (very complicated) dynamic program 
that solves 1 \pmtn,rj \ O(n^VP^) time. We get the following (new) 

result. 

Theorem 2. There exists an FPTAS for minimizing the weighted number of 
late jobs in the scheduling problem 1 1 pmtn, rj \ Y ^jUj- 

5 Example: The Restricted Shortest Path Problem 

An instance of the restricted shortest path problem (RSP, for short) consists of 
a directed graph G = (V,A) and an integer bound T. Every arc a £ A has a 
positive integer cost Wa and a positive integer transition time ta- For a directed 
path Y in G, the cost w{Y) and the transition time t{Y) are defined as the 
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sum of the costs and transition times, respectively, of the edges in the path Y. 
The goal is to find a path Y with t{Y) < T from a specified source vertex to 
a specified target vertex, that minimizes the cost. RSP is NP-complete in the 
ordinary sense (Garey & Johnson [3]). Furthermore, RSP is solvable m 0{\A\-W) 
time by dynamic programming; see for instance Warburton [19] or Hassin [6]. 
One way of doing this is to compute for every vertex v G V and for every cost 
cG {0, . . . ,W}, the smallest possible transition time of a path from the source 
vertex to v with cost c. 

Problem RSP belongs to the class of subset selection problems as described 
in Definition 1. The ground set X consists of the arcs a G A with costs Wa- 
The structure S consists of the graph G, of the transition times tj, of the bound 
T, and of the source and sink vertices. A subset Y of the arcs is feasible, if it 
forms a path from source to sink with transition time t{Y) < T. Obviously, this 
feasibility information is encoded by the structure S. We get the following result. 

Corollary 4. (Hassin [6], 1992) 

The restricted shortest path problem RSP possesses an FPTAS. □ 

The result in Corollary 4 has been established in 1987 by Warburton [19] 
for acyclic directed graphs and then in 1992 by Hassin [6] for arbitrary directed 
graphs. Lorenz & Raz [16] and Ergun, Sinha & Zhang [2] improve the time 
complexities of these approximation schemes. 



References 

1. R.E. Bellman and S.E. Dreyfus (1962). Applied Dynamic Programming. 
Princeton University Press. 

2. E. Ergun, R. Sinha, and L. Zhang (2002). An improved FPTAS for restricted 
shortest path. Information Processing Letters 83, 287-291. 

3. M.R. Garey and D.S. Johnson (1979). Computers and Intractability. W.H. Free- 
man and Co., New York. 

4. G.V. Gens and E.V. Levner (1981). Fast approximation algorithms for job se- 
quencing with deadlines. Discrete Applied Mathematics 3, 313-318. 

5. R.L. Graham, E.L. Lawler, J.K. Lenstra, and A.H.G. Rinnooy Kan (1979). 
Optimization and approximation in deterministic seqnencing and scheduling: A 
survey. Annals of Discrete Mathematics 5, 287-326. 

6. R. Hassin (1992). Approximation schemes for the restricted shortest path problem. 
Mathematics of Operations Research 17, 36-42. 

7. J. Hastad (1999). Clique is hard to approximate within Acta Mathematica 
182, 105-142. 

8. E. Horowitz and S. Sahni (1974). Computing partitions with applications to 
the knapsack problem. Journal of the ACM 21, 277-292. 

9. E. Horowitz and S. Sahni (1976). Exact and approximate algorithms for 
scheduling nonidentical processors. Journal of the ACM 23, 317-327. 

10. O. Ibarra and C.E. Kim (1975). Fast approximation algorithms for the knapsack 
and sum of subset problems. Journal of the ACM 22, 463-468. 

11. R.M. Karp (1972). Reducibility among combinatorial problems. In: R.E. Miller 
and J.W. Thatcher, editors, Complexity of Computer Computations, Plenum Press, 
New York, 85-104. 




Approximation Schemes for a Class of Subset Selection Problems 



211 



12. E.L. Lawler (1979). Fast approximation schemes for knapsack problems. Mathe- 
matics of Operations Research 4, 339-356. 

13. E.L. Lawler (1990). A dynamic programming algorithm for preemptive schedul- 
ing of a single machine to minimize the number of late jobs. Annals of Operations 
Research 26, 125-133. 

14. E.L. Lawler, J.K. Lenstra, A.H.G. Rinnooy Kan, and D.B. Shmoys (1993). 
Sequencing and scheduling: Algorithms and complexity. In: S.C. Graves, A.H.G. 
Rinnooy Kan, and P.H. Zipkin (eds.) Logistics of Production and Inventory, Hand- 
books in Operations Research and Management Science 4, North-Holland, Ams- 
terdam, 445-522. 

15. E.L. Lawler and J.M. Moore (1969). A functional equation and its application 
to resource allocation and sequencing problems. Management Science 16, 77-84. 

16. D.H. Lorenz and D. Raz (2001). A simple efficient approximation scheme for 
the restricted shortest path problem. Operations Research Letters 28, 213-219. 

17. S. Martello and P. Toth [1990]. Knapsack problems: Algorithms and computer 
implementations. John Wiley & Sons, England. 

18. S. Sahni (1976). Algorithms for scheduling independent tasks. Journal of the ACM 
23, 116-127. 

19. A. Warburton (1987). Approximation of pareto optima in multiple-objective 
shortest path problems. Operations Research 35, 70-79. 

20. G.J. WOEGINGER (2000). When does a dynamic programming formulation guar- 
antee the existence of a fully polynomial time approximation scheme (FPTAS)? 
INFORMS Journal on Computing 12, 57-75. 




Finding fc-Connected Subgraphs with Minimum 

Average Weight 



Prabhakar Gubbala and Balaji Raghavachari 

Computer Science Department, University of Texas at Dallas, Richardson, TX 75080 

{prabha, rbk}@utdallas . edu 



Abstract. We consider the problems of finding fc-connected spanning 
subgraphs with minimum average weight. We show that the problems are 
NP-hard for fc > 1. Approximation algorithms are given for four versions 
of the minimum average edge weight problem: 

1. 3-approximation for fc-edge-connectivity, 

2. Oilogk) approximation for fc-node-connectivity 

3. 2 -be approximation for fc-node-connectivity in Euclidian graphs, for 
any constant e > 0, 

4. 5.8-approximation for fc-node-connectivity in graphs satisfying the 
triangle inequality. 



1 Introduction 



Given a graph G = (V, E) that satisfies a specified property V, and a cost 
function c : E JR'*', consider the problem of finding a subgraph G = {V,E') 
that also satisfies V, and has minimum average weight, i.e., 



E 



e^E' 



c(e) 



\E'\ 



where E' satisfies V. 



In this paper, we consider the properties of /c-edge-connectivity and k-node- 
connectivity. Our algorithm for /c-node-connectivity also extends to any mono- 
tone property on graphs. P is a monotone property if G continues to satisfy V af- 
ter the addition of arbitrary additional edges to G (connecting existing vertices) . 
Gonnectivity and non-planarity are monotone properties, while acyclicity and 
planarity are not. We refer to the minimum-average-weight fc-edge-connectivity 
problem as Avg-kEc, and the corresponding fc-vertex-connectivity problem as 
Avg-kVc. Depending on the cost function on edges, the graph can be a general 
graph or a graph in a metric space (i.e., its edges satisfy the triangle inequality). 
We show that all the above versions of this problem are NP-hard and provide 
approximation algorithms for them. 



1.1 Previous Work 

y ^ / *^(^) 

For the minimization problem , lot of research has been done for the 

minimization of the numerator — that of finding minimum-cost /c-vertex or k- 
edge connected subgraphs. The algorithm given by Frederickson and JaJa [4] 
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achieves a 3-approximation algorithm for finding a minimum-cost biconnected 
subgraph. Khuller and Vishkin [5] gave a 2-approximation algorithm for the 
minimum-cost fc-edge-connectivity problem. Czumaj and Lingas [6,7] proposed 
a polynomial-time approximation scheme for minimum cost fc-connectivity prob- 
lem in Euclidian graphs. Cheriyan, Vempala and Vetta [2] gave an 0{logk)~ 
approximation algorithm for minimum cost fc-vertex-connected subgraph prob- 
lem in the general graphs if the the number of vertices is at least 6/c^. The 
minimum mean cycle problem, another related problem, is to find a cycle in a 
graph with minimum average edge weight. Karp [8] gave a 0(nm)-time algo- 
rithm for this problem in graphs with n vertices and m edges. Ahuja and Orlin 
[9] gave an 0{^Jnm\ognC)-im\e scaling algorithm. 

1.2 An Illustrative Example 

Figure 1 illustrates an example that shows the difference between the edge con- 
nectivity problems of minimizing total weight versus minimizing average weight. 
Let us consider the Euclidian version of AvG-2Ec. All vertices are in a straight 
line. The distance between vertices 1 and 2 is n units. The vertices from 2 to n 
are all uniformly spaced (one unit apart). Figure 1(a) shows an optimal min-cost 
2-edge-connected subgraph whose total weight is 4n — 4. To reduce its average 
edge weight, we can augment this solution by adding short edges, thus obtaining 
the solution in Figure 1(b), whose average edge weight is slightly less than 3. 
Figure 1(c) is an optimal solution of AvG-2Ec, with an average cost of about 
2.5 per edge. Note that if we start with the optimal min-cost 2-edge-connected 
subgraph shown, no matter how many small edges we add, we will never get an 
optimal solution to AvG-2Ec. 

2 Structure of Avg-kEc and Avg-kVc 

We first observe a useful lemma that is satisfied by optimal solutions to AvG- 
kEc and Avg-kVc. We show that all edges of a graph whose cost is smaller 
than the optimal value must be in any optimal solution for the problems. Also, 
edges whose cost is more than the average must be critical, whose removal makes 
the solution infeasible. 

Lemma 1. Let G = (V,E) be a given graph. Consider an optimal solution 
G* = (V,E*) to either Avg-kEc or Avg-kVc on G. Let its value (average edge 
weight) he c* . Then the following conditions are satisfied: 

1. If c(e) < c* for any edge e G E, then e G E* . 

2. If c(e) > c* for any edge in e G E* , then e is a critical edge in G* , i.e., 
E* — {e} is not k connected. 

Proof. G* is already /c-connected (edge or vertex). It remains /c-connected if we 
add more edges to it. If there is an edge e with c(e) < c* and e ^ E*, then adding 
e to E* is a feasible solution whose value is smaller than c*, contradicting its 
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(b) Subgraph after “small” edges are added to above subgraph. 




1 



2 3 4 




(c) An optimal AvG-2Ec subgraph with average edge cost w 2.5. 
Fig. 1. An illustrative example 



optimality. Therefore e G E* . Similarly, if an edge e with c(e) > c* is included 
in E* and E* is not critical, then E* — {e} is a feasible solution whose value is 
smaller than c*, also a contradiction. Therefore, in an optimal solution to AvG- 
kEc and Avg-kVc, edges smaller than the optimal value must be included, and 
edges bigger than the optimal must be critical. 



3 NP Hardness 

Theorem 1. Consider a k-edge-connected, undirected, edge-weighted graph G = 
(V,E), for some integer k > 2. The problem of finding an optimal solution to 
Avg-kEc on G is NP-hard. 
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Proof. We reduce the minimum-cardinality fc-edge connected subgraph problem 
of an undirected graph to Avg-kEc problem. Let G = (V, E) be a given graph 
with n nodes. We assign a cost of 1 to all edges of G. Select an arbitrary node 
u G V. Add k new nodes X = {xi,X 2 , • • ■ , Xk}, with edges of zero cost between 
every pair of nodes in {■u}UA. For every fc-connected subgraph of the new graph, 
there is a fc-connected subgraph of the original graph of the same cost, obtained 
by deleting X and their incident edges. Consider an optimal Avg-kEc on the 
new graph. All edges of cost zero must be in any fc-connected subgraph, since 
without them nodes in X will have degree less than k. This makes the average 
cost of an edge in an optimal solution to be strictly less than 1. Therefore, by 
Lemma 1, all its edges of cost 1 must be critical. Hence, the number of edges 
of cost 1 in it must be a minimum, implying that these edges correspond to a 
minimum-cardinality fc-edge-connected subgraph of G. 



4 Definitions 

We use the term “^-connectivity” to mean both fc-edge-connectivity and k- 
vertex-connectivity. It will be clear from the context which type of connectivity 
is used. Let G = (V,E) be the given k connected graph with \V\ = n, and a 
cost function c(e) defined on its edges. Let G* = (V,E*) be an optimal solu- 
tion of either Avg-kEg or Avg-kVg. Let c* be its value (i.e., the average edge 

weight of G*). That means c* = is minimum of all spanning sub- 

graphs which are also k connected. For any set of edges X and cost p, we define 
Bp{X) = {x \ X G X and c{x) > p}, and Sp{X) = {x \ x G X and c{x) < p}. We 
drop the subscript and write them as B{X) and 5(A) when p = 3c*. In other 
words, B{X) are those edges of X that cost more than 3 times the optimal value 
(c*) and 5(A) are edges of A that cost less than 3c*. 

Let P be a graph property. V is defined to be a monotone property if whenever 
a graph G = (E, E) satisfies V, then so does G' = (E, E U 5) for any set of edges 
5. In other words, if a graph G satisfies V, then so does any graph on the same 
set of nodes that contains G as a subgraph. 



5 Vertex Connectivity Problems 

In this section we consider finding a fc-vertex-connected-subgraph with minimum 
average weight (Avg-kVg). The input is an integer k and a weighted undirected 
graph G = (E, E) with vertex-connectivity at least k. We provide an algorithm 
for Avg-kVg with an approximation factor (3{a/{P — 1) -I- 1), where /? > 1 is a 
user-chosen parameter, and a is the approximation ratio of the algorithm used 
for finding a minimum-cost /c-node-connected spanning subgraph. So here the 
property we consider is fc- vertex connectivity. The following algorithm works for 
any monotone property on graphs. 
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5.1 Algorithm 

This algorithm uses an approximation algorithm for minimum cost /e-node con- 
nected subgraph as a subroutine. Let a the approximation ratio of the algorithm 
used for finding minimum-cost /c-node connected subgraph. We start with the 
solution output by it as our intial solution. Calculate the average-weight of the 
present subgraph. Add a least cost edge which is not in the solution. If the in- 
clusion of that edge decreases the average, we keep that edge. We try to add 
edges as long as it decreases the average edge cost of the current solution. We 
stop when the inclusion of the least cost edge which is not in the solution does 
not decrease the average weight of the edges. 



Algorithm Avg-kVC: 

1. Find an a-approximation of a minimum-cost fc- vertex connected subgraph 
on the given graph G = (V,E). Let = (V, if“) be the solution. 

2. E' ^ 

3. Repeat 

“ Calculate the average weight of the edges in E': 

e^E' 

0-Vgapp — |-^^7j . 

— Let x be a least cost edge in if — E', if“‘^ = E' U {x}. Calculate 






|^mc| 



~ If avginc < civgapp then E' = E' U {x}. 
Until avginc > avgapp or E = E' . 



Theorem 2. Given a k-node-connected, undirected, edge-weighted graph G = 
(V,E), if there is a a- approximation algorithm for minimum-cost k-node con- 
nected subgraph, P > 1 is a constant, there is a polynomial-time algorithm that 
returns a feasible solution o/ A vg-kVc for which the average weight of the edges 
is within /?(«/ (/i — 1) -I- 1) of c* . 

Proof. Let E* be an optimal Avg-kVc with value c*, where 

H c(e) 

^ e^E* 

Observe that the average edge weight of the solution maintained by our algorithm 
for Avg-kVc decreases monotonically, since edges are added in nondecreasing 
order of weight, and only when their addition decreases the average weight of the 
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solution. Consider some (3 > 1 such that at some point in time, the algorithm 
has added all edges that cost at most (ic* to the solution. We show that the 
average edge cost of the solution at that time {E') satisfies the theorem. Since 
the solution output by the algorithm is smaller than this value, the theorem will 
follow. Let 7 be the number of such edges we added to E°‘, the solution returned 
by the a-approximation algorithm for the fc-vertex connectivity problem. In the 
following discussion, recall that S'c(A') is the set of all edges of X whose cost is 
less than c. 






Y. c(e) 

eeE' 

~wr 



Y 

W\ 



< 



Y Y 

eeSf,,,{E') 



\E'\ 



The cost of an a-approximate solution is less than a times the optimal solu- 
tion of minimum-cost fc-node connected subgraph. So, it is also less than a times 
any feasible solution, in particular, c(e) < a c(e). It follows 

eGB“ eGE* 



a c(e) + j P c* 

avgapp < |^a| (1) 

The number of edges in an optimal solution that costs more than /3 • c* is not 
more than (1//3) times total number of edges in it. Since our solution includes 
all edges that cost less than Pc * , the number of edges in our solution is at least 
\E*\{P — 1)/ p. Also |A“| -I- 7 > \E*\{P — l)/p. Therefore 

a V c(e) 

/ eGE* , iPc* 

avgapp S |£;a|+^ 

<oC^ + /ic- 

We can choose P for a given instance as the value that minimizes the above 
approximation ratio. For fixed a, we can use Newton’s method and show that 
the ratio is minimum when P = 1 -\- ^Joi. 

We now show how the result applies to different vertex connectivity prob- 
lems. For Euclidian graphs, Czumaj and Lingas [6,7] gave a polynomial-time 
approximation scheme, i.e., a = 1 -I- e, yielding a 2 -|- e-approximation algorithm 
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for Avg-kVc problem on Euclidean graphs. For general graphs, Cheriyan, Vem- 
pala and Vetta [2] gave a 0(log k) approximation algorithm if the number of 
vertices is at least 6fc^. So, a = O(logfc); we also get an O(logfc) approxima- 
tion algorithm. For metric graphs, a = 2, and we choose (} = 1 + \/2, getting a 
5 . 8- approximation . 

6 Edge Connectivity Problems 

In this section we consider finding a /c-edge-connected-subgraph with minimum 
average weight (Avg-kEc). The input is an integer k and a weighted undi- 
rected graph G = (F, E) with edge-connectivity at least k. We provide a 3- 
approximation algorithm for Avg-kEc. 

We are able to do better for Avg-kEc because, there exists a 2-approximation 
algorithm for the fc-edge-connectivity augmentation problem, whereas currently 
there is no constant factor approximation algorithm for the fc- vertex-connectivity 
augmentation problem. Even for Avg-kVc on Euclidean graphs in the previous 
section, if we add some zero cost edges like we do in this algorithm, it ceases 
to be an Euclidean graph and there is no known constant factor approximation 
algorithm for finding a /c-node connected subgraph. 

6.1 Algorithm 

From Lemma 1, it follows that the average of any feasible solution can be im- 
proved by adding any edge in S{E) (edges whose cost is smaller than average). 
Also, c* is at least as much as a smallest edge in the graph, but not greater 
than the biggest edge in the graph. We want to know the range of the c* in 
such detail that we can define S{E) uniquely. There are at most — 1 such 

different ranges possible for c* which can define S{E) uniquely, explained as 
follows. There are at most distinct edge weights possible in a graph. We 

sort them and eliminate duplicates. If c* lies between the smallest element 
and (z + 1)*^ smallest element, S{E) is the set of i small elements. Since c* lies 
between the smallest and largest edge costs, there could be at most ~ 1 

such ranges possible. Since we are seeking a 3-approximation algorithm, we can 
add edges in S' (A) . 

We start with a graph with no edges. We guess 3c* range as explained above. 
There are only O(n^) such ranges possible. Once we guess 3c*, we set to zero 
the cost of all edges in S{E). The cost of the other edges remain the same. 
We now find a minimum weight spanning subgraph that is k-edge connected. 
The solution includes all the zero cost edges and other edges chosen by the 
approximation algorithm. Repeat the above scheme for all different ranges for 
3c* and take the solution which has minimum average. 

Algorithm Avg-kEc 

1. Sort all the edges according to their weights. Put all edges of equal weight 
in to the same class. There are at most different classess. Let L be 

the number of classes. 
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2. min-avg = c» 

3. for i = 1 to L 

— Take the given graph and set to zero the cost of all edges belonging to 
the first i classes. 

— Find a fc-edge connected subgraph in the changed graph. 

— Let G" = (V,E') be the graph returned by the algorithm. 

current-avg = — 

— min-avg = min(min-avg, current-avg); 

Let min-avg be the average weight of the edges in the approximate solution 
and let app be the corresponding subgraph. 



Theorem 3. Given a k-edge-connected, undirected, edge-weighted graph G = 
(V,E), there is a polynomial-time algorithm that returns a feasible solution of 
Avg-kEc on G for which the average weight of the edges is at most 3c*. 



Proof. Let G" = {V, E') be the solution returned by Algorithm Avg-kEc above, 
and let G* = (V,E*) be an optimum solution, 



e^E' ^ e^E* 

a^dapp = |^,| ; c = 1^*1 



(2) 



we want to prove 



O-Vgapp ^ g 

c* “ 



( 3 ) 



E* can be divided into two sets, consisting of all edges below 3c* and the other 
consists of all edges above or equal to 3c*: 



Similarly, 



E* = B{E*)GS{E*) 
E' = B{E') U S{E') 



( 4 ) 

( 5 ) 



E* forms a /c-edge connected subgraph. We can write it as B{E*) U S{E*). 
Since E' contains all edges of the original graph whose cost is smaller than c*, 
S{E*) C S{E'). Therefore, B{E*) U S{E') is a /c-edge connected graph. This 
is a feasible solution for the changed graph whose cost is c{B{E*)). We use a 
2-approximation algorithm to find a min-cost fc-edge connnected subgraph in 
the changed graph [5]. The set of edges which incur cost in the above solution 
is B(E'), which is no more than 2 times the optimal solution for this problem. 
So it is not more that 2 times to any feasible solution. 



c(e) < 2 ^ c(e) 

e^B(E') e£B(E*) 



( 6 ) 
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From equations (2) and (4), we can write 



^'^9app — 



H 

eeB(E') eeS(E') 

\B{E')\ + \S{E')\ 



( 7 ) 



If we remove some edges smaller than 3c*, the average of the edges increases. 
So, 



eeB(E') eeSjE'nE^) 

\B{E')\ + \S{E')nE*\ ^ ’ 

S{E') consists of all edges in the graph which are less than 3c*. So 
S{E' n E*) = S{E*). Also, S{E') n A* = S{E*). So, 

^ eeBjE') eeSjE^) 

|s(A')| + |5(£;*)| 

Observe that |S'(if*)| > 2/3|A*| because no more than one-third of the edges 
of E* can be greater than three times its average weight. From (6) we have 

e^B(E') e£B{E*) 

From the above two observations, 

^ eeB{E-) eeS(E*) 

- |B(A')| -b2/3|F;*| 

2 c(e) 

^ egg* 

- 2/3|A*| 

< 3c* 

This completes the proof of Theorem 3. 



Acknowledgement. We would like to thank Si Qing Zheng for posing AvG- 
kEc. This research was supported in part by the National Science Foundation 
under grant CCR-9820902. 




Finding fc-Connected Subgraphs with Minimum Average Weight 221 



References 

1. J. Cheriyan and R. Thurimella, Approximating minimum-size k-connected spanning 
subgraphs via Matching, SIAM J. Comput., 30, pp. 528-560, 2000. 

2. J. Cheriyan, S. Vempala and A. Vetta, Approximation algorithms for minimum-cost 
k-vertex connected subgraphs., STOC 2002: 306-312 

3. C. G. Fernandes, A better approximation for the minimum k-edge-connected span- 
ning subgraph problem, J. Algorithms, 28, pp. 105-124, 1998. 

4. G. N. Frederickson and J. JaJa, Approximation algorithms for several graph aug- 
mentation problems, SIAM J. Comput., 5, pp. 25-53, 1982. 

5. S. Khuller and U. Vishkin, Biconnectivity approximations and graph carvings, J. 
Assoc. Comput. Mach., 41, pp. 214-235, 1994. 

6. Artur Czumaj and Andrzej Lingas, A Polynomial Time Approximation Scheme for 
Euclidean Minimum Cost k- Connectivity, ICALP 1998, pp 682-694, 1998. 

7. Artur Czumaj and Andrzej Lingas, On Approximability of the Minimum-Cost k- 
Connected Spanning Subgraph Problem, Proc. 10th Annual ACM-SIAM Symp. on 
Discrete. Algoithms (SODA), pp. 281-290, 1999. 

8. R. M. Karp. A characterization of the minimum cycle mean in a digraph. Discrete 
Math, 23, pp 309-311, 1978. 

9. R. K. Ahuja and J. B. Orlin, New scaling algorithms for assignment and minimum 
cycle mean problems. Mathematical Programming, 54, pp. 41-56, 1992. 




On the (Im)possibility of Non- interactive 
Correlation Distillation 



Ke Yang 

Computer Science Department, Carnegie Mellon University, 
5000 Forbes Ave. Pittsburgh, PA 15213, USA; 
yangkeOcs . cmu.edu 



Abstract. We study the problem of non-interactive correlation distil- 
lation (NICD). Suppose Alice and Bob each has a string, denoted by 
A — aoai ■ ■ ■ a-n-i and B = bobi ■ ■ ■ b„-i, respectively. Furthermore, for 
every A: = 0, 1, ...,n — 1, (ak,bk) is independently drawn from a distri- 
bution Af, known as the “noise mode” . Alice and Bob wish to “distill” 
the correlation non- interactively, i.e., they wish to each apply a function 
to their strings, and output one bit, denoted by X and Y, such that 
Prob [X = Y] can be made as close to 1 as possible. The problem is, for 
what noise model can they succeed? This problem is related to various 
topics in computer science, including information reconciliation and ran- 
dom beacons. In fact, if NICD is indeed possible for some general class 
of noise models, then some of these topics would, in some sense, become 
straightforward corollaries. 

We prove two negative results on NICD for various noise models. We 
prove that for these models, it is impossible to distill the correlation 
to be arbitrarily close to 1. We also give an example where Alice and 
Bob can increase their correlation with one bit of communication. This 
example, which may be of its own interest, demonstrates that even the 
smallest amount of communication is provably more powerful than no 
communication . 



1 Introduction 

1.1 Non-interactive Correlation Distillation 

Consider the following scenario. Let Af be a distribution over X x E, where E is 
an alphabet. We call Af a “noise model.” Suppose Alice and Bob each receives a 
string A = agai ■ ■ ■ a„_i and B = bobi ■ ■ ■ 6„_i, respectively, as their local inputs. 
For every fc = 0, 1..., n — 1, (a^, bk) is independently drawn from Af. Now, Alice 
and Bob wish to engage in a protocol to “distill” their correlation. An the end of 
the protocol, they wish to each output a bit, denoted by X and Y, respectively, 
such that both X and Y are “random enough”, while Prob [X = Y] can be 
made as close to 1 as possible, possibly by increasing n. We call such a protocol 
a correlation distillation protocol. Furthermore, if Alice and Bob wish to do 
so non-interactively, i.e., without communication, we call this “non-interactive 
correlation distillation” (NICD). Notice that in NICD, the most general thing 
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for Alice and Bob to do is to each apply a function to their local inputs and 
outputs one bit. The problem of NICD is, for what noise model can Alice and 
Bob achieve this goal? 

We note that NICD is indeed possible for many noise models. For example, 
if a noise model J\f is in fact “noiseless,” i.e. Prob (a,b)eAf = ^] = 1? then NICD 
is possible. However, we are interested in the “noisy” noise models, for example, 
the binary symmetric model, where Alice and Bob each has an unbiased bit as 
input, which agree with probability 1 — p, and the binary erasure model, where 
Alice’s input is an unbiased bit x, and Bob’s input is x with probability 1—p, and 
a special symbol _L with probability p. These models are extensively studied in 
the context of error correcting codes [3,9], where Alice encodes her information 
before sending it through a “noisy channel” . It is known that there exists efficient 
encoding schemes that withstand these noise models and allow Alice and Bob 
to achieve almost perfect correlation. However, in the case of NICD, the “raw 
data” are already noisy. Can the techniques in error correcting codes be used 
here, and is NICD possible for these noise models? 

1.2 Motivations and Related Work 

Besides the obvious relation to error correcting codes, the study of NICD is 
naturally motivated by several other topics. We review these topics and discuss 
some of the related work. 

Information Reconciliation. Information reconciliation is an extensively studied 
topic [4,13,6,7,8] with applications in quantum cryptography and information- 
theoretical cryptography. In this setting, Alice and Bob each receives a sequence 
of random bits drawn from a noise model, while Eve, the eavesdropper, also pos- 
sesses some information about the their bits. Alice and Bob wish to “reconcile” 
their information via an “information reconciliation protocol”, where they ex- 
change information in a noiseless, public channel in order to agree on a random 
string U with very high probability. Therefore, information reconciliation proto- 
cols are somewhat like correlation distillation protocols. However, the primary 
concern for information reconciliation is privacy, i.e., that Eve gains almost no 
information about U. Notice that Eve can see the conversation between Alice and 
Bob, and thus maximum privacy would be achieved if information reconciliation 
can be performed without communication. 

Random Beacons. A random beacon is an entity that broadcasts uncorrelated, 
unbiased random bits. The concept of random beacons were first introduced in 
1983 by Rabin [15], who showed how they can be used to solve various problems 
in cryptography. From then on, random beacons have found many applications 
in security and cryptography [5,12,2,10]. There are many proposals to construct 
a publicly verifiable random beacon, among them are the ones that use the sig- 
nals from a cosmic source [14]. In these proposals, Alice (as the beacon owner) 
and Bob (as the verifier) both point a telescope to an extraterrestrial object, e.g. 
a pulsar, and then measure the signals from it. Presumably these signals contain 
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enough amount of randomness. Then Alice converts her measurement results 
into a sequence of random bits, and publishes them as beacon bits. Bob can 
then verify the bits by performing his own measurement and conversion. How- 
ever, it is inevitable that there would be discrepancies in the results of Alice and 
Bob, due to measurement errors (described by a noise model). These discrep- 
ancies may cause the beacon bits published by Alice to disagree with the ones 
computed by Bob. One of the major concerns in the study on random beacons 
is to prevent cheating in the presence of measurement error. In other words, one 
needs to design a mechanism to prevent Alice from maliciously modifying her 
measurement data in order to affect the beacon bits, while pretending that the 
modification comes from the measurement error. Notice that in general, there 
is no communication between Alice and Bob. We note that if NICD is possi- 
ble, then the cheating problem would be solved, since NICD protocols can be 
used to distill almost perfectly correlated bits. Then with very high probability, 
the bits output by Alice and Bob should agree, and this essentially removes the 
measurement error. 

Related Work. As we have discussed, the problem of NICD lays, in some sense, 
at the foundations of both the studies of information reconciliation and random 
beacons. In fact. Researchers from both ares have, to some extent, considered 
the problem of NICD. In particular, a basic version of the problem concerning 
only the binary symmetric noise model was discovered and proven indepen- 
dently by several researchers since as early as 1991, including Alon, Maurer, and 
Wigderson [1] and Mossel and O’Donnell [14]. They proved that NICD is im- 
possible over the binary symmetric noise model. Mossel and O’Donnell studied 
multi-party version of this problem, where k > 2 parties wish to agree on some 
random bits. They also only considered the binary symmetric noise model. In 
fact, we are not aware of any prior work that studies NICD beyond the binary 
symmetric noise model. 

We stress the importance of understanding the problem of NICD for general 
noise models. As we have mentioned, this problem is important to both the stud- 
ies of information reconciliation and random beacons. In both studies, there is 
no reason to assume that the binary symmetric noise model is the only reason- 
able one. As an example, the measurement of the signals from extraterrestrial 
objects is not unique, and different measurements may yield different noise mod- 
els. If one of these noise models admits NICD, then the problems of information 
reconciliation and random beacon could, in some sense, be solved. Therefore, a 
better understanding of NICD over more general class of noise models would be 
very helpful. 

1.3 Our Contribution 

We study NICD beyond the binary symmetric noise model. First, we prove an 
impossibility result for NICD over a class of so-called “regular” noise models in 
Section 3. Intuitively, a noise model Af is regular if it satisfies the following three 
requirements: that it is symmetric, i.e., Af{a,b) = Af{h,a) for every a,b G S; 
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that it is locally uniform, i.e., both the distributions of the local inputs of Alice 
and Bob are uniform; that it is connected, i.e., S cannot be partitioned into Sq 
and Si such that M{a, b) = Af{b, a) = 0 for all a £ Sq and h & Si. Notice that if 
a noise model is not connected, that NICD is indeed possible for such a model. 
Suppose S is partitioned into Aq and Si . If Alice and Bob interpret symbols in 
Sq as a “0” and symbols in Si as a “1”, then they essentially have a noiseless 
binary noise model, which admits NICD. 

In section 4, we move over to the binary erasure noise model. It is the simplest 
noise model that is not symmetric, and thus is not regular. The binary erasure 
model is also a realistic one. Consider as example the situation where Alice 
and Bob receive their inputs by observing a pulsar. It is quite likely that the 
noise of the measurements by Alice and Bob are of the “erasure-type”, i.e., 
the corruption of information can be detected. Furthermore, it is also possible 
that Alice and Bob have different measurement apparatus and different levels 
of accuracy. In the random beacon problem, Alice (as the beacon owner) might 
own a more sophisticated (and more expensive) measuring device with higher 
accuracy, while Bob (as the verifier) has a more noisy measurement device. An 
extreme case would be that Alice has perfect accuracy in her measurement, but 
Bob’s measurement is noisy. Such a situation can be described by the binary 
erasure noise model. We prove that NICD is impossible for this noise model as 
well. 

The impossibility results we prove suggest that for many noise models, com- 
munication is essential for correlation distillation. Thus it is interesting to ask 
how much communication is essential, and in particular, if a single bit of com- 
munication helps. In Section 5, we answer this question in positive by presenting 
a protocol that non-trivially distills correlation from the binary symmetric noise 
model with one bit of communication. This result shows that even the minimal 
amount of communication is provably more powerful than no communications 
at all. The protocol itself may also be of its own interest. 

Due to space limitation, some of the proofs are omitted and the readers are 
referred to the full version of this paper [16]. 



2 Preliminaries and Notations 

We use [n] to denote the set {0, 1, ..., n — 1}. We often work with symbols from a 
particular alphabet, which is a finite set of cardinality q and is normally denoted 
by S. We often identify S with [g]. 

All vectors are column vectors by default. A string is a sequence of sym- 
bols from an alphabet. We identify a string with a vector and use them inter- 
changeably. For a string x of length n, we use x[j] to denote its j-th entry, for 
j = 0,1,. ..,n — 1. We use 1„ to denote the all-one vector (whose each entry 
is 1) of dimension n. When the dimension is clear from the context, it is often 
omitted. 
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We identify a function with its truth table, which is written as a vector. For 
example, we view a function over {0, 1}” also as a 2"-dimensional vector. We 
assume a canonical ordering of n-bit strings. 

We will work with tensor products. Let A and B both be vectors or both be 
matrices. We use A 0 B to denote the tensor product of A and B, and A®" to 
denote the n-th tensor power of A, which is the tensor product of n copies of A. 

Definition 1 (Noise Model). A noise model over an alphabet B, often de- 
noted by Af, is a probabilistic distribution over E x B. The n-th tensor power of 
a noise model Af is the distribution of a pair of length-n strings {A,B), where 
A = aoai---a„_i and B = bobi ■ ■ ■ bn-i, and (ak,bk) is independently drawn 
from Af for fc = 0, 1, ..., n — 1. 

In this paper we study randomized, non-interactive protocols. For the impos- 
sibility results in Section 3 and Section 4, we assume that Alice and Bob each 
outputs a single bit, since it suffices to prove a negative result on the “mini- 
mally useful” protocols. We shall consider protocols that outputs multiple bits 
in Section 5. 

Since Alice and Bob do not communicate, the most general thing they can 
do is to apply a (randomized) function to their private inputs and outputs a bit. 

Definition 2 (Protocols). A protocol V over a noise model Af is a family of 
function pairs for n > 0, where : A" i— >■ [—1,1] are called the 

characteristic functions. The output of protocol V over noise model Af, denoted 
by V{Af), is a sequence of distributions {'Di,T> 2 , ...}, where the n-th distribution 
T>„ is of the bit pair (A„,Y„), defined as follows. 




Where Bp is the Bernoulli Distribution of parameter p, defined as Bp(0) = 1—p 
and Bp(l) = p. 



Definition 3 (Statistical Distance). The statistical distance between two 
probabilistic distributions A and B, denoted as SD(A, B), is defined to be 
SD(A, S) = ~ ^( 2^)1 where the summation is taken over the sup- 

port of A and B. If SD{A, B) < e, we say A is e-close to B. 

Definition 4 (^-Locally Uniform Protocols). A protocol V is J-locally uni- 
form over a noise model Af, if for every n > 0, both A„ and are 5-close to 
the uniform distribution over {0,1}, where {Xn,Yn) is the n-th distribution of 
V{Af). A protocol is locally uniform if it is 0-locally uniform. 

Definition 5 (Correlation of Protocols). The correlation of a protocol V 
over a noise model Af, denoted by Corjy[P], is defined to be 

Corjy[P] = liminf (2 • Prob [A„ = Y„] — 1} (1) 

n 

where (A„,F„) is the n-th distribution ofV{Af). 
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3 An Impossibility Result for Regular Noise Models 

We prove a general impossibility result for NICD over the regular noise models. 

Definition 6 (Distribution Matrix). Let J\f he a noise model over S, where 
jifl = q. We say a q x q matrix M is the distribution matrix for J\f, if = 
M{x,y) for all x,y G Sf We write the distribution matrix of Af by Mjq-. 

Definition 7 (Regular Noise Model). A qx q matrix M is regular if it is 
symmetric, and Iq is the unique eigenvector with the largest absolute eigenvalue, 
let e he the difference between M’s largest absolute eigenvalue and the second 
largest. We call q-e the scaled eigenvalue gap of M. A noise model Af is regular 
if its distribution matrix is regular. 

Theorem 1. If Af is a regular noise model over S with scaled eigenvalue gap 
e, then the correlation of any S-locally uniform protocol over the Af is at most 

Notice that a distribution matrix M is non-negative (that every entry is non- 
negative). By the Perron-Frobenius Theorem [11], if M is symmetric, irreducible, 
and has Ig as an eigenvector, then Ig is the unique eigenvector with the largest 
eigenvalue, and thus M is regular. 

Proof. Consider a protocol V over the noise model Af. We define q = \U\ and 
identify S with [g] for the rest of the proof. We use M to denote the distribution 
matrix of Af and denote the eigenvector of M by Uq, Ui, ..., Vg_i with correspond- 
ing eigenvalues Aq, ..., Ag_i. We assume that |Ao| > |Ai| > • • • > |Ag_i|. Since M 
is regular, Aq is the unique largest eigenvalue that corresponds to eigenvector Ig. 

Since M is the distribution matrix, we know that the sum of all its entries is 
1. Thus we have 

1 = • M • Ig = Ao • • Ig = Ao • (?, 

or Ao = l/q. Since the scaled eigenvalue gap of M is e, we know that |Ai| = 

(1 - e)/9- 

Consider the characteristic functions (j)^ and It is easy to see that 

Prob [X„ = 1] = i . 1 + ^ ^ AT®"(a, b) ■ 4>^{a) (2) 

Clearly, M®" is the distribution matrix for Af®'”' . We will be using a result 
about the eigenvalues and eigenvectors of M®, stated in Lemma 1. 

Since V is 5-locally uniform, we have 

^ ^ AT®"(a,6)-<^^(a) <25 

oSi;" &SZ'" 

^ Here we identify E with [qj. 



(3) 
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or |(</>"^)^ • M®" • lgn| < 2(5, as we identify (j)"^ with the (/"-dimensional vector 
represented by its truth table. Since Ig is an eigenvector of M with eigenvalue 
1/q, lg*i is an eigenvector of M®" with eigenvalue !/(/” (see Lemma 1). Since 
M is symmetric, so is M®". Thus we have 

|l^„-,^^|<2(5-g". (4) 

Similarly we have 

(5) 

Now, we consider the correlation of V. Let (X„, Y„) be the outputs of Alice 
and Bob. Then we have 

2 • Prob [A„ = y„] - 1 = ^ ^ B) ■ • <(.^(5) (6) 

In other words, we have 

2 • Prob [A„ = r„] - 1 = ■ M®" • (/)® (7) 

We diagonalize the matrix M®". First we define a natural notion of inner 
product: (A, B) = -^ A[a;]i3[a:]. It is obvious that under this inner prod- 

uct, both (j)^ and (/5^ have norm at most I. Since M®” is symmetric, it has a set 
of eigenvectors that form an orthonormal basis. We denote the eigenvectors of 
M®" by Ut with corresponding eigenvalues Ht, where t G [(/"]. We assume that 
I Mo I ^ I Ml I ^ ■■■ ^ IM 9 "-iI- By Lemma 1, the eigenvalues are of the form 
nr=i^fci> where ki G [q\. Therefore M®" has a unique maximum eigenvalue 
Mo = Ag = l/(/", which corresponds to the eigenvector 1®" = Ign. The second 
largest absolute eigenvalue of M®” is |/ii| = Ag“^ • |Ai| = (1 — e)/(/". 

Now we perform a Fourier analysis to vectors (j)"^ and (j>^ . We write (j)"^ = 
Z)te[(j"] = X)tG[(j’*] Pt - Ut- Then by Parseval, we have J2t “t - 

— 1- Furthermore, from (4) and (5), we have |o;o| < 26 and |/3g| < 25 . 

Putting things together, we have 

Qorj^^r.[V] = • M®” • (/.^ 

= g” • ^ at- (3f fJ-t 

iG[g"] 

< e • |(^o/3o| + (1 - e) X! f^t\ 

< e • 4(5^ -I- (1 - e) 0t ^ 

< l-e(l-4(52). 

Lemma 1. Let A be an ax a matrix of eigenvectors vq, Va-i, with correspond- 
ing eigenvalues Ag, ..., Ao-i. Let B be abxb matrix of eigenvectors uq, ■■■, u&-i, 
with corresponding eigenvalues /ig, ..., /ih_i. Then the eigenvalues of the matrix 
A® B are Vi ® Uj with corresponding eigenvalues Xi ■ p,j, for i G [a] and j G [6]. 



(eigenvalue gap) 



(Cauchy-Schwartz) 
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Definition 8 (Binary Symmetric Noise Model). The binary symmetric 
noise model is a distribution over alphabet {0, 1}, denoted by Sp and is defined 
as 5(0, 0) = 5(1, 1) = (1 - p)/2 and 5(0, 1) = 5(1, 0) = p/2. 

Corollary 1. The correlation of any locally uniform protocol over the binary 
symmetric noise model Sp is at most 1 — 2p. 

It is easy to see that this bound is tight, since the naive protocol where both 
Alice and Bob outputs their first bits is locally uniform with correlation 1 — 2p. 

Proof. Notice that Sp is regular with scaled eigenvalue gap 2p. 

This corollary was independently discovered by various researchers, including 
Alon, Maurer, and Wigderson [1], and Mossel and O’Donnell [14], and the latter 
attributing it as a “folklore” . 

4 The Binary Erasure Noise Model 

We prove a similar impossibility result for another noise model, namely the 
binary erasure noise model. Intuitively, this model describes the situation where 
Alice sends an unbiased bit to Bob, which is erased (and replaced by a special 
symbol _L) with probability p. 

Definition 9 (Binary Erasure Noise Model). The binary erasure noise 
model is a distribution over alphabet {0, 1, _L}, denoted by Sp and defined as 
5(0, 0) = 5(1, 1) = (1 - p)/2, 5(0, T) = 5(1, T) = p/2. 

Notice that in this model, Alice’s input is the uniform distribution over {0, 1}, 
and Bob’s input is 0 and 1 with probability (1— p)/2 each, and _L with probability 
p. A naive protocol under this model only uses the first pair of the inputs. Alice 
outputs her bit, and Bob outputs his bit if his input is 0 or 1, and outputs a 
random bit if his input is _L. This is a locally uniform protocol with correlation 
1 — p. The next theorem shows that no protocol can do much better than the 
naive protocol. 

Theorem 2. The correlation of any locally uniform protocol over the noise 
model Sp is at most . 

We suspect that it is not a tight bound, but it is sufficient to show that it is 
bounded away from 1 and is independent from n. Therefore, even with perfect 
accuracy in Alice’s measurement, NICD is impossible if Bob’s measurement is 
noisy. 

5 A One-Bit Communication Protocol 

We present a protocol that non-trivially distills correlation over the binary sym- 
metric noise model with one bit of communication. Recall that over no non- 
interactive, locally uniform protocols can have a correlation more than 1 — 2p. 
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Now, we consider protocols with one bit of communication. Suppose Alice sends 
one bit to Bob, which Bob receives with perfect accuracy. With one bit of com- 
munication, Alice can generate an unbiased bit x and send it to Bob, and then 
Alice and Bob both output x. This protocol has perfect correlation. Thus, to 
make the problem non-trivial, we require that Alice and Bob must output two 
bits each. Suppose Alice outputs (Ai, A 2 ) and Bob outputs (Yi,T 2 )- We define 
the correlation of a protocol to be 2 • minj=i ^2 {Prob [Xi = Yi]} — 1. In this 
situation, we say a protocol is locally uniform, if both (Ai, A 2 ) and (Yi,Y 2 ) are 
uniformly distributed. 

Now we describe a locally uniform protocol of correlation about 1 — 3p/2. 
The protocol is called the “AND” protocol. Both Alice and Bob only take the 
first two bits as their inputs. Alice directly output her bits, and sends the AND 
of her bits to Bob. Then, intuitively. Bob “guesses” Alice’s bits using the Bayes 
rule and outputs them. A technical issue is that Bob has to “balance” his output 
so that the protocol is still locally uniform. The detailed description is presented 
below. 



STEP I Alice computes r := ai A 02, sends r to Bob, and outputs (ai, 02). 
STEP II Bob, upon receiving r from Alice: 

IF r = 1 THEN output ( 1 , 1 ). 

ELSE IF 61 = 62 = 1 THEN output 

• (0, 0) with probability p /(2 — p); 

• (0, 1) with probability (1 —p )/{2 — p)\ 

• (1,0) with probability (1 —p )/{2 — p)\ 

ELSE output (61,62). 



We can easily verify (by a straightforward computation) the following result. 
Theorem 3. The AND protocol is a locally uniform protocol with correlation 



This is a constant-factor improvement over the non-interactive case. 

This result may seem a little surprising. It appears that Alice does not fully 
utilize the one-bit communication, since she sends an AND of two bits, whose 
entropy is less than 1. It is tempting to speculate that by having Alice send 
the XOR of the two bits, Alice and Bob can obtain better result, since Bob 
gets more information. Nevertheless, the XOR does not work, in some sense due 
to its “symmetry”. Consider the case Alice sends the XOR of her bits to Bob. 
Bob can compute the XOR of his bits, and if the two XOR’s agree. Bob knows 
that with high probability, both his bits agrees with Alice’s. However, if the 
two XOR’s don’t agree. Bob knows one of his bits is “corrupted,” but he has 
no information about which one. Furthermore, however Bob guesses, he will be 
wrong with probability 1/2. On the other hand, in the AND protocol, if Bob 
receives a “1” as the AND of the bits from Alice, he knows for sure that Alice has 
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( 1 , 1 ) and thus he simply outputs ( 1 , 1 ); if r = 0 and 6 i = 62 = 1, he knows that 
his input is “corrupted” , and he “guesses” Alice’s bit according to the Bayes rule 
of posterior probabilities. If Bob receives a “0” as the AND and ( 61 , 62 ) ^ (1, 1), 
then the data looks “consistent” and Bob just outputs his bits. In this way, 1/4 
of the time (when Bob receives a 1), Bob knows Alice’s bits for sure and can 
achieve perfect correlation; otherwise Alice and Bob behave almost like in the 
non-interactive case, which gives 1 — 2p correlation. So the overall correlation is 
about 1/4 • 1 -h (3/4) • (1 - 2p) = 1 - 3p/2. 
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Abstract. The paper settles a long standing problem for Mazurkiewicz 
traces: the pure future local temporal logic dehned with the basic 
modalities exists-next and until is expressively complete. The analogous 
result with a global interpretation was solved some years ago by 
Thiagarajan and Walukiewicz (1997) and in its hnal form without any 
reference to past tense constants by Diekert and Gastin (2000). Each, 
the (previously known) global or the (new) local result generalizes 
Kamp’s Theorem for words, because for sequences local and global 
viewpoints coincide. But traces are labelled partial orders and then the 
difference between an interpretation globally over cuts (configurations) 
or locally at points (events) is significant. For global temporal logics the 
satisfiability problem is non-elementary (Walukiewicz 1998), whereas 
for local temporal logics both the satisfiability problem and the model 
checking problem are solvable in Pspace (Gastin and Kuske 2003) as in 
the case of words. This makes local temporal logics much more attractive. 

Keywords: Temporal logics, Mazurkiewicz traces, concurrency. 



1 Introduction 

In various applications, the behaviour of a concurrent process is not represented 
by a string, but more accurately by some labelled partial order. This led Mazur- 
kiewicz to the formulation of trace theory [14] which became a popular setting 
to study concurrency, see [7]. 

One advantage is that formal specifications of concurrent systems by tem- 
poral logic formulae have a direct (either global or local) interpretation for Ma- 
zurkiewicz traces. It is therefore no surprise that temporal logics for traces have 
received quite an attention, see [8,2,15,16,17,18,19]. In [20] (resp. finally in [4, 
6]) it was shown that the basic global temporal logic with future tense operators 
and with (resp. without) past tense constants is expressively complete with re- 
spect to the first order theory. However the satisfiability problem for these global 
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logics is non-elementary [21]. The main reason for this high complexity is that 
the interpretation of a formula is defined with respect to a global configuration 
of the system, i.e., a finite prefix of the trace - and the prefix structure of traces 
is much more complex than in the case of linear orders (words) . On the contrary, 
a local logic formula is evaluated at a local event of the system, i.e., at some 
vertex of the trace. The main advantage is that all local temporal logics over 
traces whose modalities are definable in monadic second order logic are decid- 
able in PSPACE [10]. This is optimal since the PsPACE-hardness occurs already 
for words. 

The better complexity makes local temporal logic much more attractive; and 
several attempts were made to prove expressive completeness. In [5] expressive 
completeness for the basic pure future local temporal logic is established, if the 
underlying dependence alphabet is a cograph. Moreover, one can hope to go 
beyond cographs, only if each trace is equipped with some bottom element or if 
we allow past tense modalities. This second approach is used in [11,12] to obtain 
expressive completeness for all dependence alphabet. In [11], the full power of 
exists-previous and since modalities equipped with filters is used. The result is 
improved in [12] where only past constants are necessary. Another temporal logic 
which is not local and based on more involved modalities (including both past 
tense and future tense) was shown to be expressively complete and decidable in 
PSPACE [1]. However, the most basic question remained open: whether expressive 
completeness holds for a pure future local temporal logic based upon exists-next 
and until, only. The present paper gives a positive answer to this question. 

Note that the focus of this paper is only to obtain the simplest possible pure 
future and expressively complete local temporal logic. In order to express easily 
properties of systems one should instead introduce all convenient MSO modal- 
ities since the satisfiability and the model checking problem remains decidable 
in PsPACE regardless of the fixed set of modalities used [10]. 

For lack of space we give only main ideas and skip several proofs including 
interesting new techniques in Section 5. They can be found in the full version. 



2 Preliminaries 

A dependence alphabet is a pair (A, D) where the alphabet A is a finite set of 
actions and the dependence relation A C A x A is reflexive and symmetric. The 
independence relation I is the complement of D. For a G A, the set of letters 
dependent of a is denoted by D{a) = {6 G A j (a,b) £ D}. 

A Mazurkiewicz trace is an equivalence class of a labelled partial order t = 
[V, A, A] where A is a set of vertices labelled by A : A — >■ A and < is a partial 
order over A satisfying the following conditions: For all x £ V , the downward 
set lx = {y £ V \ y < x} is finite, and for all x,y £ A, {\{x),X{y)) G D implies 
X < y or y < X, and x < y implies (A(x), A(y)) G D, where < = < \ <^ is the 
immediate successor relation in t. For x £ V, we also define i}.x = {y £ V \ y < 
x}, lx = {y £ V \ X < y}, and hx = {y £ V \ x < y} . 

The trace t is finite if A is finite and we denote by M(A, D) (or simply M) 
the set of finite traces. By M(A, D) (or simply R), we denote the set of finite or 
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infinite traces (also called real traces). Let alph(t) = A(T4) be the alphabet of t 
and alphinf(t) = {a G A7 | A“^(a) is infinite} be the alphabet at infinity of t. For 
Al C Af, we let = {t € K | alph(t) C A} and = {t G M | alph(f) C A}. 

Let ti = [Vi,<i,Ai] and t 2 = [V2,<2,A2] be a pair of traces such that 
alphinf(ti) x alph(t2) C I. We then define the concatenation of ti and t 2 to 
be ti ■ t 2 = <, A] where V = Vi U V2 (assuming w.l.o.g. that Vi fl V2 = 0), 

A = Ai U A2 and < is the transitive closure of the relation <1 U <2 U {V\ x 
V 2 n \~^{D)). The set M of finite traces is then a monoid with the empty trace 
1 = (0,0,0) as unit. The concatenation of two trace languages Lf, L G M is 
K-L = {r-s\r&K,s&L and alphinf(r) x alph(s) C /}. We also use the 
infinite product t = rii>o^* where (ti)i>o C K is a sequence of real traces such 
that alphinf(fi) x alph(tj) C / for all i < j. 

We denote by min(t) the set of minimal vertices of t. We let = {t G K | 

I min(t)| = 1} be the set of traces with exactly one minimal vertex. To simplify 
the notation, we also use min(t) for the set A(min(t)) of labels of the minimal 
vertices of t. What we actually mean is always clear from the context. 

If [/ C t/ is an interval (txn}i/ C U for all x,y £ U) then [U, <, A] is a factor 
of t. We often identify U with [U, <, A]. In particular, if a; G F then lx and JJ-a; 
are prefixes of t, and lx and fia; are suffixes of t. For ACE, the maximal prefix 
of t using actions from A only is ^^(f) = {a; G F | A(}x) C A}. 



3 Local Temporal Logics 



The basic syntax of linear temporal logic LTLj^ = LocTLx’[EX, U] is given by 
(p ::= a, {a £ E) \ \ ip\/ (p \ EA^p \ (p\) (p. 



We give a standard locally defined semantics. Let t G K be a real trace and 
X G t be a vertex in t. We have: 



t,x\= a 
t,x\= ->ip 
t,x \= ipy 'll 
t,x\= EX ip 
t,x \= ip\) 'll 



if A(x) = a 
\i tp 

if t \= ip or t \= ij} 

if 3y {x <y and t,y \= ip) 

if 3z {x < z and t,z\= xp and My {x < y < z) ^ t,y \= ip). 



We define some abbreviations. We write T for true, T for false and ¥ ip = T\)ip 
means that ip holds in the future. For A C E, we let A = VasA 

For X £ t and C C E with C x C C D, we denote by xc the unique minimal 
vertex of fl-x fl A“^(C') if it exists, i.e., when fixnA“^(C) yf 0. Note that x < xc 
if Xc exists. If C = {cj is a singleton, then we simply write Xc instead of x^c}- 
We write Xa || Xb, if both Xa and Xb exist, but neither Xa < Xb nor Xa > Xb. 

Let C C 2^\{0| be a covering of E by (dependence-) cliques, this means that 
C X C C D for all C £ C, and for all a £ E, we have a £ C for some C £ C. 
We consider the local temporal logic LocTL(C) = LocTLi;[(Xo < Xf,),Xc, Uc] 
whose syntax is given by 



ip::=a\{Aa<Ab)\^ip\ipy ip\Acip\ip\icV 
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where C ranges over C and a, b range over S with {a, b} ^ C for all C G C. li 
C = {o} is a singleton, then we simply write Xq and Ua for and Ujaj.. The 
semantics of LocTL(C) is defined as follows. First, (pUctp = (v? V -^C) 0 {C Atp). 
Then, t,x \= Xc (p if xq exists and t, xc H V- Finally, t,x \= (X„ < X^) if Xa, xi, 
exist and Xa < xt- If a,b G C then we have (X„ < Xf,) = Xf, T A Xc{~'b Uc a). 
Hence, we can freely use in LocTL(C) all constants (X^ < Xf,) with a,b G S. 

We show that LocTL(C) is a fragment of LocTLi;[EX, U]. First, we have 
(Xa < Xb) = VcGi:(^c < Xo) A (Xc < Xf,) A EX(c A -'(-■a U b)). Thus, it is enough 
to consider a conjunction (Xc < Xa) A EXc. For A(x) = a, this is EX(cA F a). For 
A(a;) yf a, this is EX(c A F a) A -i(-ic U a). 

For the modality Xc, we have Xc p = Vcec(^c<f3 A AdeC\{c} 
and XcP = (-ic A (_L Uc p)) V (c A EX(T Uc p))- 

We may also use p XUc A = Xc(p Uc A)- Note that the modalities Xa and 
Ua can be expressed in all logics LocTL(C): let C G C such that a G C, we have 
XaP = -'a XUc (a A p) and p Uaip = {~'a V p) Uc {a A tp). 

When b,c G S are such that fix fl A“^(&) yf 0 and flxt, fl A“^(c) yf 0, we 
let Xbc = (xb)c be the minimal vertex of f|"Xf, IT A“^(c). We now define con- 
stants (Xac = Xbc) for all a,b,c G S with a yf c yf & by: t,x \= (Xac = 
Xbc), if Xac, Xbc exist and Xac = Xbc- It is far from being obvious that the new 
constants (Xac = Xtc) can be expressed in LocTLi;[EX, U]. We will devote Sec- 
tion 7 to the proof of the next lemma. 

Lemma 1. The constants {Xac = Xbc) can be expressed in LocTL(C) for all 
a,b,c G S with a ^ b and all coverings of S by cliques C. 

4 Lifting Lemma 

In this section A denotes a subset of S and we let 4 = Af \ 4 be its complement. 
For X G t gM. we define pa{x, t) to be the prefix of fx which is given by the set 
of vertices {z Gt\x < z and 'ix < y < z, \{y) G 4}. 

Lemma 2. Let x G t G M. and a G S . Then, 

Xa exists and Xa G pa{x, t) if and only if t,x |= Xa T A AcsT — ^a)- 

Lemma 2 is easy to show. The aim of this section is to establish the following. 



Theorem 3 (Lifting Lemma). Let p G LocTLi;[(Xa < Xf,),Xa,Ua] and A C 
S. Then we effectively find a formula p^ G LocTLx’[(Xa < Xf,),Xa,Ua] such 
that for all X G t G R we have: pa{x, t),x \= p if and only if t,x ^ p-^ . 

The proof is done by structural induction on p. We start with the following 
observations: = a for all a G Af, (/? A V' = P^ A ip , and 

Now, pA{x,t),x ^ (Xa < Xf,) if and only if both t,x \= (Xa < Xb) and 
Xf, G pA(x,t). However, Xf, G pA(x,t) can be expressed using Lemma 2. 

Define p XUa ip as Xa(p Ua ip)- Then, both Xa and Ua can be expressed in 

XUa. Indeed, XaP = TXUa p and pUaip = (aAip)\/ ((-■aV p) A v?XUa ip)- Thus 
^ 

it is enough to define p XUa ip - This is the difficult part for which we establish 
first some auxiliary results. 
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Lemma 4. Let x € t € M. and a € S such that Xa exists and Xa G ^A{x,t). 
Define B = {a} U {5 G A \ {a} | i, h AcgA ~^0^o.b = Xch)}. 

Then we have a € B C A and ^A(x,t) H fa^a = f'B{xa,t). 

As a consequence of Lemmata 4 and 2 we obtain the following proposition. 

Proposition 5. Let a € B C A. Then, there exists a formula Switch^^B.a G 
LocTLi;[(Xd < Xe), (Xdf = Xe/)] such that the following two assertions hold. 

1. Ift,x ^ Switch^ B a thenXa G giA{x,f) exists and fj,A{x,t)r\fXa = fJ,B{xa,t). 

2. If Xa G ij.a{x, t) exists, then we have t, a; |= Switch^^s^a for some a € B C A. 

Using an induction on the size of A, the remaining case of the proof of 
Theorem 3 follows easily from the following. 



Lemma 6. We have (p XUa 4> = a\\! where 

ai= \J Switch^.B.a A Xa , 

BCA 

<J2 = SwitchA.^.a A (^(SwitchA.^.a A XUo ( V A cri))^ 



5 Expressive Completeness of LocTL(C) 

In the following, if a real trace t G K has a unique minimal vertex x, then by 
t \= ip 'we mean t,x \= p. Hence, if t G K and a; G t is any vertex, then t,x \= p 
has the same meaning as fa; H (if the reference to t is clear) . 

Now, we want to define initial satisfiability, i.e., when does a trace t G M 
satisfies a local temporal logic formula p. Since a trace t does not necessarily 
have a unique minimal position, there is no canonical way to choose an initial 
position in t. Our approach uses rooted traces as in e.g. [3]. Let ff ^ S and 
t= [U, <, A] G K(A7, D). The rooted trace associated with t is 

#t = [U U {#}, < U ({#} X (U U {#})), A U (# ^ #)]. 

It is a trace over the alphabet B' = B U {#} and the dependence relation 
D' = Du ({#} X A) U (A7 X {#}). Then, for a formula p G LocTL(C), we define 
He{p) = {t G R{B,D) I fft ^ p}. We simply write C{p) when there is no 
ambiguity on the alphabet. 

Alphabetic conditions can be easily expressed in LocTL(C). Therefore, for 
A U B, the languages M^, Ka, (alphinf = A) = {t G K | alphinf(t) = A} and 
(min C A) = {t G K I min(t) C A} are definable in LocTL(A). 

The first order theory of traces FOi;(<) is given by the syntax: 

p ::= Pa{x) \ x<y\^p\pVp \ 3xp, 

where a G B and x,y G Var are first order variables. Given a trace t = [V, <, A], 
we interpret each predicate Pa by the set {x G V \ A(x) = a} and the relation < 
as the strict partial order relation of t. The semantics then lifts to all formulas 
as usual. For closed formulae we can define as usual the language T(p) = {t G 
K I t ^ p}. We say that a trace language L C K is expressible in FOi;(<) if 
there exists a sentence p G FOi;(<) such that L = C{p). 
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Theorem 7. Let C he a covering of S by cliques of {S,D). A real trace lan- 
guage over M{S,D) is expressible in FOi;(<) if and only if it is expressible in 
LocTL(C). 

Corollary 8. The local temporal logic LocTLi;[EX, U] based on the modalities 
EX and U is expressively complete. 

The logic LocTL(C) is a fragment of LocTLi;[EX, U] and by its semantics 
it is clear that each real trace language expressible in LocTLi;[EX, U] is also 
expressible in FOi;(<). Therefore it is enough to prove the other direction of 
Theorem 7. 

We use the algebraic notion of recognizability. Let : M — >■ M be a morphism 
to a finite monoid M. For s,t G M, we say that s and t are /i-similar, denoted 
s ^fi t, if we can write s = n and t - n SiAi G M and 

h{si) = h{ti) for all t > 0. The transitive closure of is an equivalence 
relation. For t G K, we denote by [t]~^ the equivalence class of t. When there is 
no ambiguity, we simply write « and [t]. Since M is finite, the equivalence 
relation « is of finite index with at most |Mp + |M| equivalence classes. A trace 
language L C M is recognized by h if it is saturated by « (or equivalently by ~), 
i.e., t G L implies [t] C L for all t G R. 

A finite monoid M is aperiodic if there is an n > 0 such that u" = 
for all u G M . A trace language L C R is aperiodic if it is recognized by some 
morphism to a finite and aperiodic monoid. 

Theorem 9 ([8,9]). A language L C R(A, Z?) is expressible in FOi;(<) if and 
only if it is aperiodic. 

To prove that aperiodic trace languages are expressible in LocTL(C), we 
use an induction on A. If A = 0 then there are only two trace languages: 0 
and R = {1} which are respectively defined by A and T. Assume now that 
A yf 0 and fix a covering C of A by cliques of (A, A). By induction, each 
aperiodic language L C R^ with A C A is expressible in LocTL(C|^), where 
C\A = {CC\A\CgC}\ {0}. In the following, we fix some C G C and we let 
A = A \ C C A. We use the unambiguous decomposition R = R^(min C C). 

Lemma 10. Let L C R &e a trace language recognized by h. Then, L is a finite 
union of languages of the form (Li nR^)(L 2 H (min C C)), where the languages 
Li,L 2 C R are recognized by h. 

Let T = {[t] I t G R^ n CRyi}. We consider T as a finite alphabet. Each trace 
t G (min C C) has a unique C -factorization t = Y[i<n with n G N U {w} and 
ti G R^nCR^ for all i < n. Hence, we can define a mapping a : (min C C) — >■ T°° 
by cr{t) = n ,<Jti\ where n ti is the C-factorization of t. 

Lemma 11. Let L C R &e recognized by h. Then, LA (min Q C) = a~^{K) for 
some aperiodic word language K C T°“. 

The next lemma uses a classical result that aperiodic word languages K C 
are expressible in LTL'r[XU]. This result is based on Kamp’s Thm. [13]. 
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Lemma 12. Suppose that each aperiodic trace language over A is expressible 
in LocTL(C|^). Let K C T°° be an aperiodic word language. There exists ip G 
LocTL(C) such that for all t G (min Q C)\ {1}, we have u{t) € K if and only if 
t \= ip. 

We now turn to the proof of Theorem 7. Using Lemma 10, we have to show 
that if Li and L 2 are recognized by h then L = {Li fl Ka)(L 2 H (min C C)) is 
expressible in LocTL(C). 

Since L\ HMa is aperiodic, using the induction on the alphabet, we find pi G 
LocTL(C|^) such that for all ti G Kaj U G Li iff #^i \= Ti - Let G LocTL(C) 
be the formula given by the Lifting Lemma (Theorem 3). For all t G K, we have 

^ iff ffpAft) (= Pi iff pA{t) G Li. 

Since L 2 is recognized by h, using Lemma 11 we have L 2 H (min Q C) = 
cr~^{K) for some aperiodic word language K C T°°. By Lemma 12, we find 
P 2 G LocTL(C) such that for all t 2 G (min C C) \ {1} we have t 2 H T 2 iff 
o'(t 2 ) G IF iff ^2 G L 2 . Let ^ = -'XcT V Xq P 2 if 1 G L 2 and ^ = Xq P 2 
otherwise. 

We claim that L = Cs{p) where p = A 



6 Process Based Logics Are Expressively Complete 

We show that we can deal also with process-based logics as introduced in [19]. 
In this framework, we start with a finite set of processes V = {1, • . • ,n} and 
a mapping p : 27 — >■ 2^ \ {0}. The execution of an action a G 27 requires the 
participation of all processes in the nonempty set p{a) . If p{a) = {z} is a singleton 
then the action a is local to process i. Otherwise, the execution of a requires the 
synchronization of all processes in p{a). The dependence relation is D = {(a, b) G 
27^ I p{a) r\p{b) 0}. Hence the set C = {p~^{i) | z G P} is a covering of 27 by 
cliques of {S,D). 

Thanks to this more concrete view of the dependence alphabet based on 
processes, we can define temporal modalities that involve locations of actions. 
In [19], the formula Oip means that p holds at the first event of process z that 
is not in the past of the current vertex. Clearly, this is not a future modality. 
Here, we use a future variant p meaning that p holds at the first event of 
process z which is strictly above the current vertex. More formally, we define 
Xi p := Xp-i(q p. The until modality introduced in [19] is also not pure future. 
Here we use a future variant pUiif which means that on the sequence of events 
located on process i and above the current vertex we observe p until if. More 
formally, we define pUi-ip := p Up-qq ip. 

Since the set C = {p~^{i) | z G P} is a covering of 27 by cliques of (27,11), a 
reformulation of Theorem 7 yields 

Theorem 13. Let V be a finite set of processes and p : 27 — >■ 2^ \ {0} be a 
location map. The process-based local logic LocTL[(Xa < Xb),Xi,Ui] based on 
the modalities Xi and Ui for i G V and using only constants {Xa < X^) with 
p{a) n p{b) = 0 zs expressively complete. 
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7 Removing Constants: Proof of Lemma 1 

We prove Lemma 1 by showing how to express the constants (X^c = ^bc) in 
terms of a Boolean combination of formulae of type (X^ < Xe) , X^, and 
for various d, e G S. Note that the constants (X^ < Xe) and (X<j || Xe) with 
obvious semantics may be used, too. The overall strategy is to proceed in O(n^) 
rounds where n = |L7|. In each round we introduce new formulae which are 
approximations of (X^c = X^c)- At the end these approximations are getting 
so weak that we can replace them by false. In each round, when we replace an 
approximation we obtain a new formula of size 0{nf). Thus, overall (X^c = X^c) 
is replaced by a complex formula of exponential size in |T'|. 

Lemma 14. 1. Let z he a vertex such that A(z) = a and Zc exists. There exist 
letters {oi,... ,ak-i} C A7\{a, c} such that z < Zai < ••• < < Zc and 

a = ao — ai Ofc-i — Qk = c in {S, D) . 

2. Let X he a vertex and {ai, . . . ,afc_i} Q LJ\ {a,c} such that Xa < Xaai < 

• • • < Xaak-i < Xac and 0 = 00 — ai Ofe_i — Ofc = c in (S,D). Lf Xa || Xc, 

then Xaai = sJcoi for some 1 < i < k. 

Let a,c € S, a c, and let t G R, a: G t such that Xac exists. Define Sx{a, c) 
as the smallest integer fc > 1 such that there exist letters oi, • • • , Uk-i such that 

Xa < Xaai < ■ < Xaak-i < Xac and O = Oq Oi Ofc_l — Ofc = C in 

{S, D). Note that such an integer k exists by Lemma 14 and Sx{a, c) < IN’! — 1. 

We also introduce the set Fx{a,c) which consists of all pairs (d, e), d ^ e, 
such that either Xde does not exist or a^oc < Xde- Note that | Cc (o, c) | < | Lf p — 1 27 1 . 
Throughout we use the following fact: 

if X < y and y/g < Xac, then Fa,(o, c) C Fy{f, g). (*) 



Proposition 15. Let a,b,c G 27 with a ^ c ^ b. For each triple (m, i, r) with 
0 < m < |27p — |27|, 0 < ^ < 2|27| — 2, and r G {0, 1} we can define a formula 
(Xac = '^bc,'m,i,r) in terms of {Xd < Xg), Xd and Ud with d, e G 27 such that 
for od X G t G M the following assertions L and LL are satisfied. 

I. If t, X [= (Xac — Xdci rn, £, x) , then t, x ^ (Xac — X^c) ■ 

II: If the following four conditions C\, . . . , C 4 are simultaneously satisfied, then 
it holds: t,x\= {Xac = X^c, rn, £, r) . 

Ci: t,x 1= (Xoc = Xbc) . 

C 2 : \Fx(a,c)\ = \Fx(b,c)\ > m. 

C 3 : Sx(a,c) + Sx(b,c) < i. 

C 4 : r=l ort,x^ (Xa II Xb) A =[(Xc < Xa) A (Xc < Xf,)]. 



Corollary 16. The formulae (Xac = X^c) and (Xac = Xf,c,0,2|27| — 2,1) are 
equivalent. 
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Proof of Prop. 15. For a = b we define (Xac = 1^bc,‘ni,l,r) by the formula 
Xa Xc T which simply states that Xac exists. Obviously, I and II are both satisfied 
for a = b. Hence in the following we may assume |{a, 6, c}| = 3. Consider a 
triple {m,£,r). If now either m > — jifl — 2 or ^ < I, then we define 

(Xac = 1^bc,‘ni,l,r) by false. Then, I and II hold. 

In the following we may assume by induction that formulae are defined sat- 
isfying both I and II for all triples (m',£',r') where either m' > m or m! = m, 
£' < £ or m' = m, £' = £, and r' < r. 

Case r = \ : We define (Xoc = 1^bc,'m,£, 1) by v3o V V (^2 V 933 where: 

V30 = (Xa < Xb) A Xa(Xfc < Xc), = (X, < Xa) A Xb(Xa < Xc), 

P2 = (Xac = X{,c,TO,-^,0), V53 = (Xa || Xfc) A '01 A '02, 

01 = (Xc < Xa) A (Xc < Xfe), 02 = 01 Uc ((Xac = X{,c,TO,^,0) A -'0l). 

Case r = 0: We define (Xac = X^c, m, £, 0) by tq V Ti V T 2 V T3 where: 

ro = (Xa<Xc)A(X, <Xc), 

n = (Xc<Xa)A \J r(6, &') A Xc(Xac = Xf,/c,TO,.^- 1, 1), 
b^b'^c 

T2 = (Xc < Xfe) A \J r(a, a') AXc(Xa'c = Xf,c,m,^- 1,1), 
a^a' 

T3 = \J r(a,a') A r(6, &') A Xc(Xa'c = X{,/c,m,£ - 2, 1), 

a^a' 

b^b'^c 

T{d, d') = (Xdd' = Xcd', m -I- 2, 21271 — 2, 1) A < X^). 

We only give the proof of assertion II. Consider a; G t G M such that Ci, ... , 
C4 are all satisfied. In particular, Xac, Xbc exist and we have Xac = Xbc- If Xa < Xc 
then Xc = Xac = Xbc hence also Xb < Xc and t,x ^ tq. Similarly, if Xb < Xc then 
t,x \= Tq. Hence in the following we assume that neither Xa < Xc nor Xb < Xc. 

There are three cases: 1) Xc < Xa, 2) Xc < Xb, and 3) neither Xc < Xa nor 
Xc < Xb. These cases correspond to ri, T2, and respectively. Since r = 0, C4 
implies Xa || Xb and ~'{xc < Xa/\Xc < Xb). Hence, in case 1, using -i(xb < Xc) and 
6 yf c, we get Xb || Xc. Similarly, in case 2 we have Xa || Xc and in case 3 we have 
both Xa II Xc and Xb || Xc. One can prove the following. 

Claim. If Xa || Xc then we find a' G 27 \ {a, c} such that both 6y{a', c) < Sx{a, c) 
and t,x \= r{a, a'). 

We come back to the proof of the three cases. We start with case 2). We have 
Xc < Xb and Xa || Xc. Let a' be given by the claim stated above, and let y = Xc. 
We can show that Ci, ... ,C4 hold for y, (o', b, c) and {m,£— 1, 1). By induction, 
we get t,y \= {Xa'c = Xbc, m,£ — 1,1) and therefore, t,x \= T 2 . 

Case 1) is symmetrical. For case 3), we apply twice the claim in order to get 
a' and b' . We can show that Ci, . . . ,Ci hold for y = Xc, (o', b' , c) and (m, £—2, 1). 
By induction, we get t,y \= {Xa'c = Xb'c, m,£ — 2, 1) and therefore, t,x\= t^. □ 
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Dedicated to Imre Simon on the Occasion of his 60th Birthday 

Abstract. In this paper we investigate how it is possible to recover 
an automaton from a rational expression that has been computed from 
that automaton. The notion of derived term of an expression, introduced 
by Antimirov, appears to be instrumental in this problem. The second 
important ingredient is the co-minimization of an automaton, a dual 
and generalized Moore algorithm on non-deterministic automata. If an 
automaton is then sufficiently “decorated” , the combination of these two 
algorithms gives the desired result. Reducing the amount of “decoration” 
is still the object of ongoing investigation. 



1 A Natural Question 

Kleene’s theorem states the equality of two families of languages: the family 
of languages described by rational (z. e. regular) expressions coincides with the 
family of languages accepted (or recognized) by finite automata — equality which 
is often written as: Reg A* = RecA* . Its proof amounts to showing the two 
inclusions: 

RecA*CRegA* (la) and RegA*CRecA* (lb) 

and is constructive. In the earlier proofs, inclusion (la) is established by an algo- 
rithm, say <l>, that takes an automaton A and produces a rational expression E 
— we thus can write E = ^(A) — such that the language denoted by E is equal 
to the language accepted by A. And conversely, inclusion (lb) is obtained by 
showing that Rec A* is (effectively) closed under union, product and star. 

This closure proof is easily turned into an algorithm, say S', that takes an 
expression E and computes an automaton A = S'(E) with the property that 
the language accepted by A is equal to the language denoted by E. It was not 
long before it was understood that these algorithms and their properties are as 
interesting in themselves as to be a piece of the proof of Kleene’s theorem. 

The problem we address here is to find S-type and S"-type algorithms which 
would be inverse of each other, that is which are going forth and back between 



M. Farach-Colton (Ed.): LATIN 2004, LNCS 2976, pp. 242-251, 2004. 
© Springer- Verlag Berlin Heidelberg 2004 




How Expressions Can Code for Automata 



243 



expressions and automata not only at the level of the families but at the level 
of the individual objects. In order to understand the challenge of this problem, 
we have to say more about the ^-type and S'-type algorithms. 




Fig. 1. The state elimination method on Vi, the “divisor by 3” 

The two better known algorithms of the (P-type {i.e. from automata to ex- 
pressions) are the so-called “McNaughton-Yamada” algorithm ([13]) and “state 
elimination method” (cf. [16,17] for instance)^. Although the computations in- 
volved in these algorithms are somewhat different (above all they are organized 
in a different way), they produce roughly the same results. Both algorithms de- 
pend on an ordering of the states of the automaton. Figure 1 shows the results of 
the state elimination method on a same automaton, with three different order- 
ings of the states. The result, and in particular its size, may considerably vary 
with the ordering that is used. But one cannot avoid a combinatorial explosion 
in the general case^: 

Fact 1 The size of a rational expression E computed from a finite automaton A 
by the state elimination method may he exponential in the number of states of A. 

There is a larger variety of algorithms turning a rational expression into a 
finite automaton — that is T-type algorithms — both in results and in methods, 
than those of ^-type. They fall roughly into two families. 

The first class of algorithms yields what is often called the Glushkov, or the 
position, automaton of an expression ([10]). It is a non deterministic automaton 
with n -|- 1 states for an expression of literal length n. The Thompson construc- 
tion ([15]) produces an automaton with e-moves which is transformed into the 
position automaton when the e-moves are eliminated in the adequate way. Let 
us denote by an algorithm that produces the position automaton. 

The algorithms of the second class are based on the definition of the deriva- 
tion of an expression. First introduced by Brzozowski ([5]), the definition of 
derivation has been slightly, but smartly, modified by Antimirov ([1]) and yields 
a non deterministic automaton which we propose to call the derived term au- 
tomaton of the expression and which is smaller than or equal to the position 

^ A third one ([9]) gives rise to elegant proofs but is not useful for actual computations. 
^ e.g. an automaton whose underlying graph is the complete graph on the set of states. 
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automaton. The automaton of derived expressions computed in [5] is the de- 
terminized automaton of the derived term automaton. Champarnaud and Ziadi 
([7]) have given an efficient method to compute the derived term automaton of 
an expression. 

A bridge between the two families of algorithms was first given by Berry-Sethi 
who showed that the Brzozowski derivation applied on a “linearized” version of 
an expression gives the position automaton of that expression ([3,4]), and then 
by Champarnaud-Ziadi who showed that the derived term automaton of an 
expression is a quotient {i.e. a morphic image) of the position automaton ([8]). 

Fact 2 In the worst case, the minimal size (number of states) of an automaton 
accepting the language denoted by an expression is linear in the literal length of 
the expression^ . 

The juxtaposition of Facts 1 and 2 shows that there is no hope to find algo- 
rithms which are inverse of each other if we stay in these general families. 

In [6], Caron and Ziadi describe an algorithm, say O, which decides whether 
or not an automaton A is the position automaton of a rational expression E; 
and if the answer is positive, O moreover computes E. Even if 0 is not properly 
a <?-type algorithm since it does not compute an expression for any automaton, 
it holds: 



For any rational expression E, 0(>Fp(E)) = E . (1) 

Our purpose here is to describe a (slight) modification of into and 
an algorithm 17 which given a rational expression E computes an equivalent 
automaton and such that, if E is obtained from an automaton A by a <?'-type 
algorithm, then the result of 17 is precisely A'. 



For any automaton A, l7(<?'(7l)) = A . (2) 

In the next section, we present the two main constructions on which such 
an 17 is built: the — barely modified — (Antimirov) derivation of expressions 
and the co-minimization of an automaton. We then (section 3) observe that these 
constructions yield the core of an algorithm and describe the partial linearization 
which makes the algorithm work in every case (Theorem 2). In conclusion, we 
mention several directions of investigation in order to minimize the linearization. 

Space limitation has forced us to reduce to a very sketchy state the definitions 
and the description of the reduction of the linearization. In contrast, we have 
kept full length development for the examples together with their figures for we 
think they are the best introduction to the subject. A paper under the same title 
is to be published in a special issue of TIA-RAIRO dedicated to Imre Simon 
and gives a much more detailed presentation ([12]). 

e.g. E = /* where / is a word. 



3 
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2 The Ingredients of a Solntion 

In the sequel, A is an alphabet, i.e. a finite set of letters and A* the free 
monoid generated by A. Rational expressions over A* are the well-formed for- 
mulae with 0, 1, a C A as atomic formulae, * as unary operator and -I- and • 
as binary operators. The constant term of an expression E, denoted by c(E), is 
(the boolean) 1 or 0 according to whether the empty word belongs or not to the 
language denoted by E. It can easily be computed on the rational expression (c/. 
[1,11] for instance). 



2.1 The Automaton of Derived Terms 



Definition 1 ([1]). Let E be a rational expression on A and let a be a letter 
in A. The B-derivative^ of E with respect to a, denoted -^E, is a, set of rational 
expressions on A, recursively defined by: 



d_ 

da 



0 == 



= VM..4 



d_ 

da 



(E-F) = 



' d_ 
da 






if b = a 
otherwise 



( 3 ) 

( 4 ) 



da 



(E*) 




• E* 



( 5 ) 



The induction implied by (3-5) should be interpreted by distributing derivation 
and product over union: 



d_ 

da 





IjE, .F=U(E,.F) . 

-is/ J iG/ 



Definition 2. Let E be a rational expression on A and g a non empty word 
of A* , i.e. g = fa with a in A. The B-derivative of E with respect to g, de- 
noted -§gE-, is a set of rational expressions over A, recursively defined by the 
formulae (3-5) and by: 

We shall call derived term of E any rational expression which belongs to a 
set E for s ome g in A* . 

We call it “B-derivative” and not simply “derivative” for two reasons. First in order to 
avoid confusion with the derivation defined by Brzozowski, and second because this 
derivation is better understood — as we have explained in [11] — if the expressions 
are considered as expressions with multiplicity: “classical” expressions are expressions 
with multiplicity in the Boolean semiring B. 
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Theorem 1 (Antimirov [1]). The number of derived terms of a rational ex- 
pression E is finite and smaller than or equal to the literal length of £ plus 1. 



Example 1. Let E2 = (a + 66 + 6a(6 + aa)*ab)* be the expression computed in 
Figure 1 (b). The computation of the derived terms of E2 goes as follow [for sake 
of conciseness we put Hi = (6+ aa)*a6]: 

J^[aHiE2] = {HiE2}, J^[aHiE2]=0, 

j^[HiE2] = {6E2.aHiE2}, ^ [Hi E 2 ] = {Hi E 2 } . 

Thus E2 has 4 derived terms: E2 itself, 6E2, oHi E2 and Hi E2. 

The above definition is the one given by Antimirov, which we have kept 
for accurate reference. We now slightly modify the definition of derived terms 
in order to reach our goal. For that purpose, we first define® a new operation 
on rational expressions which, roughly speaking, consists in decomposing an 
expression into a set of expressions whose first factor is not a sum. 

Definition 3. i) The set of initial derived terms of an expression E is a set 
d(E) of expressions inductively defined by: 

d(0) = |0}, d(l) = |l}, d(a) = |a}, Va G A 
d(E + F) =d(E)Ud(F) , d(E • F) = [d(E)] • F , d(E*) = |E*}. 

ii) The set of derived terms o/E is redefined as the smallest set that contains 
the initial derived terms of E and that is closed under derivation (in the sense 
of Definition 1 ). 

In [1], Antimirov has defined an automaton by means of the derived terms 
and we use here the same construction mutatis mutandis. 

Definition 4. The derived term automaton of an expression E is the automa- 
ton Ae whose states are the derived terms of E and whose transitions are defined 
by: 

i) the initial states are the initial derived terms of E; 

ii) a state K is final if and only if c{K) = 1; 

iii) (K, a, K') is a transition of Ae if and only if K' belongs to ^ K. 

The automaton Ae recognizes the language denoted by E (the proof goes as 
in [1]). In the sequel, we denote by A the function that maps a rational expression 
onto its derived term automaton: Z\(E) = Ae (and Z\ is a <F-type algorithm). 



® As we have explained in [11], this operation can be considered as a derivation with 
respect to the empty word. 
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Example 2 (Ex. 1 cont.). As ^(£ 2 ) = E 2 , the new definition does not change the 
derived terms of E 2 and Figure 2 shows the derived term automaton of E 2 . 




Fig. 2. The derived term automaton of E 2 = (a + + ba{h + aa)*ab)* 



Example 3. Let us consider the automaton A\ of Figure 3 (a) and let E 4 be 
the expression obtained by the state elimination method on A\ using the order 
1-2-3. It holds: 

E 4 = a* -I- a* b{ba* b)* ba* + (a* a + a*b{ba*b)*{a + ba* a)) Fi , where 

Fi = (6 -h ba*a -F (a -F ba* b) {ba* b)* {a + ba*a))* {ba* -F (a -F ba* b) {ba* b)* ba*) . 

Then, d(E 4 ) = {a*, a*b{ba*b)*ba* , a*oFi, a*b{ba*b)*{a + 6a*a)Fi}. And the 
derived terms of E 4 are read on the automaton A{EA) itself (Figure 3 (b)). 




Fig. 3. Anticipation of the algorithm 



2.2 Minimal Co-quotient of an Automaton 

The classical process of minimization of a deterministic automaton amounts to 
comparing transitions going out states of the automaton. It is less classical, but 
not new by far, to consider the same kind of process on automata that are 
not necessarily deterministic (c/. for instance the definition of simulation among 
transition systems [2]). We are interested here in the dual of such process; it 
could be defined on the transposed automata, but we prefer to give the direct 
definition using the incoming transitions. 
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Definition 5. Let A be an automaton and Q its set of states. 

i) An equivalence ^ on Q is an In-similarity equivalence if p ^ p' implies : 

a) p is initial if and only if p' is initial; 

b) if there exists a transition {q,a,p), there exists a state q' such that: 

q' ^ q and {q',a,p') is a transition of A. 

ii) An automaton B is a co-quotient of A if there exist an In-similarity equiv- 
alence ^ on Q and a bijection (p between the states of B and the classes of ~ 
such that: 

a) a state r of B is initial iff the states of ip{r) are initial; 

b) a state r of B is final iff at least one state of ip{r) is final in A; 

c) (s, a, r) is a transition of B iff, for every p € ifi(r), there exists q G (p(s) 
such that (q,a,p) is a transition of A. 

iii) If ~ is the coarsest In-similarity equivalence on Q, B is the minimal co- 
quotient of A and B is said to be a co-minimal automaton. 

The minimal co-quotient can be computed by a Moore algorithm that consists 
in refining the trivial partition by splitting the classes that are in contradiction 
with the In-similarity property. The algorithm stops as soon as the partition is 
an In-similarity equivalence. We denote this co- minimization algorithm by T. 
Let us note that T{A) is canonically attached to A and not to the language 
accepted by A. 

Example 4 (Ex. 1 cont.). The states 6 E 2 and aHiE 2 are In-similar in the au- 
tomaton of Figure 2. The minimal co-quotient is therefore Vi (Figure 1). 

Example 5 (Ex. 3 cont.). It should be clear that the horizontal layers in Fig- 
ure 3 (b) form the maximal In-similarity equivalence and that the minimal co- 
quotient of L\(E 4 ) is thus equal to A\ (Figure 3 (a)). 

3 Building a Solution 

The above two examples show two instances where: 

A = ToAo ^A) (7) 

and this is the main idea of the paper: ToZ\ is “fundamentally” the inverse of <P. 
Observe that this would not hold {e.g. in Example 3) if we had not modified the 
definition of the derivation. The same equality (and observation) hold if T o Z\ 
is applied to Ei or to E 3 (Figure 1 (a) and (c)). 

However, it is clear that (7) cannot hold in full generality: if A is not co- 
minimal for instance, certainly (7) does not hold. But the situation is even more 
tricky and it may happen that (7) does not hold even for co-minimal automata, 
as shown by the following example. 

Example 6. The automaton V'l (Figure 4 (a)) is co-minimal. After two steps 
of the computation of <P (following the indicated ordering), the configuration is 
the same as the one obtained after one step on the automaton of Figure 1 (b). 
Thus ^((P'l) = E 2 and T(Z\(E 2 )) is equal to Vi and not to V[. 




How Expressions Can Code for Automata 



249 




(a) The automaton V[ (b) V[ after the elimination of two states 
Fig. 4. The state elimination method on the automaton V[ 

A way of escaping the above mentioned difficulties is to “decorate” some 
labels of the automaton in order to indicate in the expression that some oc- 
curences of letters in the expression come from different transitions. We call this 
operation a partial linearization and we denoted it by A. The delinearization is a 
projection that we denote by 77. The aim is obviously to keep the linearization as 
small as possible. However, and as far as now, we can only prove the correctness 
of the algorithm if the linearization makes the automaton not only co-minimal 
but also CO- deterministic (that is reverse deterministic). This will give sufficient 
conditions that are certainly not necessary. We come back to this question in 
the conclusion. 

Theorem 2. Let A he an automaton, A a partial linearization that makes A{A) 
a minimal co- deterministic automaton and 77 the corresponding delinearization. 
It then holds: 

A = noToAo^oA{A) . (8) 



If we come back to the notation of the introduction. Theorem 2 gives the “L' 
and Q we are looking for: <P' = <I> o A and Q = II oT o A. 

Example 1 (Ex. 6 cont.). V[ is linearized into V” as shown on Figure 5 (a); the 
result of ^ is E 2 = (a -I- 66 -I- 6a(6 -I- aa)*ah)* , and A{E' 2 ) is an automaton whose 
minimal co-quotient is 7^". 




(a) The linearized automaton V" (b) The derived term automaton of E2. 



Fig. 5. The complete algorithm on automaton V[ 

Theorem 2 is a direct consequence of the following: 

Theorem 3. Let A he a co- deterministic automaton and E = ^(.4) a rational 
expression computed from A hy the state elimination method. Then, the derived 
term automaton 2\(E) 0 / E is co- deterministic. 
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Idea of the proof. Every occurence of a letter in E (and in any derived term 
of E) comes from a certain transition. The study of the way this transition is 
eliminated during ^ (whether it is a loop or whether or not the origin of the 
transition is smaller than the end state with respect to the ordering u that was 
used for the elimination) allows to describe the form of the derived terms and to 
prove, by using the assumption of co-determinism that, for every letter a, there 
is at most one transition labelled by a that arrives in every derived term in the 
automaton Z\(E). □ 

Proof of Theorem 2. The minimal co-quotient of any co-deterministic automaton 
is the co-minimal automaton of the language. Therefore, if A is the co-minimal 
automaton of the language, T{A{E)) = A. □ 

Remark 1. If we reverse our construction®. Theorem 3 states that if A is deter- 
ministic, a (right) derived term automaton of 'T{A) can be built which is directly 
a deterministic automaton, which has a linear number of states (in the size of 
^{A)) — and this without running any determinization. 

4 Discussion 

As we said, the conditions put on A in Theorem 2 are sufficient but far from being 
necessary. For instance, the automaton A\ in Example 3 is not co-deterministic 
and Ai = T{A{(I>{Ai))) holds though. This example and other computations 
have led us to consider several ways that can help in distinguishing derived terms 
and thus reducing the role of A. As far as now, they can serve as heuristics and 
it is our current work to turn these ideas into precise statements. Let us quote 
here three of these directions of research. 

Choosing a smart ordering. The automaton V'l of Example 6 may give an il- 
lustration of this idea. The reader can check that if the elimination on V[ is 
performed in the ordering: 3-2-4-1, the resulting expression is E'f — = 

(a -I- h{h+ ab* a{ab* a)b)* and V[ = T{A{(I>(fP[))) : no linearization at all is nec- 
essary. Thus, the search for a smart ordering may be — to some extend — an 
alternative to the linearization of A. 

Using the structure of A{<I’{A)). Figure 3 (b) shows clearly the structure of 
the derived term automaton of an expression E = I’(A) that is computed on a 
strongly connected automaton A\ the last p of A to be eliminated corresponds 
to a term that is a cutvertex in A(E) and this property holds inductively on 
subautomata. This observation is another way to distinguish states that are 
otherwise labelled by a same derived term. It thus lead to an improved version 
of A which may again depend on the ordering of the elimination. 

Taking multiplicity into account. A fundamental property of the algorithms 
and A is the fact that they respect the multiplicity of paths. The minimal co- 



We have defined a derivation that is applied to the left side of the expression; similar 
operation can be defined on the right side, and we would call the resnlt the right 
derived terms. 
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quotient of A((l>(A)) when computed as an automaton with multiplicity (in N) 
may contain transitions with coefficients larger than 1 (and this is clearly not 
what is wanted). Therefore, T has to be modified into an algorithm Y' which 
computes a B-automaton that is a co-covering of A{(1>{A)) (as defined in [14]). 
But contrary to the minimal co-quotient, the minimal co-covering of an automa- 
ton is not necessary unique. 

These examples and remarks give strong evidences that the computation 
of the derived terms of an expression is not only an algorithm that builds an 
equivalent automaton but also a way to retrieve the “track” of the states of an 
automaton when the expression has been computed from that automaton. How 
far these tracks are faithful, and how to read them efficiently are questions that 
are still under investigation. 
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Abstract. The set Z /3 of /J-integers is a Meyer set when /3 is a Pisot 
number, and thus there exists a finite set F such that Z/s — Z/s G Z/s + F. 

We give hnite automata describing the expansions of the elements of 
Zfj and of Zp — Z^. We present a construction of such a finite set F, 
and a method to minimize the size of F. We obtain in this way a finite 
transducer that performs the decomposition of the elements of Zp — Zp 
as a sum belonging to Z^ -f F. 

1 Introduction 

The so-called Meyer sets have been introduced by Meyer [11,12] under the name 
of “quasicrystals” in order to formalize the quasicrystals discovered by the physi- 
cists in the eighties. A set A is a Delaunay set if it is uniformly discrete and 
relatively dense. A set A is a Meyer set if it is a Delaunay set and there exists 
a finite set F such that A — A C A -|- F. There exist strong relations between 
Meyer sets and some algebraic integers. Recall that a Pisot number (or a Pisot- 
Vijayaraghavan number) is an algebraic integer > 1 such that all its algebraic 
conjugates have modulus strictly less than one. A Salem number is an algebraic 
integer such that every conjugate has modulus smaller than or equal to 1, and 
at least one of them has modulus 1. The following result from Meyer makes 
the connection between Meyer sets and those algebraic integers. If A C K" is a 
Meyer set and if /J > 1 is a real number such that f3X C A then /J is a Pisot or 
a Salem number. Conversely for each n and for each Pisot or Salem number /?, 
there exists a Meyer set A C M" such that f3X C A. 

Note that all the quasicrystals encountered in the real world are linked to 
quadratic Pisot numbers, namely , 1 + and 2 -|- -\/3. 

In this paper we study Meyer sets Z/j associated with /J-expansions, [3 being 
a Pisot number, and give a construction of a minimal finite set F such that 
Z^ — Z^ c Z^ -|- F. 

Lagarias [8] gave a general construction of a finite set F satisfying A — A C 
A -I- F for a Delaunay set A such that A — A is also a Delaunay set. But the 

* Work supported by the CNRS/JSPS contract number 13569 
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sets obtained are huge and no method of minimization of these sets is known. 
Minimal sets F are given in [3] for when /? is a quadratic Pisot unit. When 
/3 is a quadratic Pisot number, a possible set F for Z^ is exhibited in [6]. 

We first give finite automata describing the formal addition and substraction 
of beta-integers. We characterize the cases when the formal addition gives a 
system of finite type when the original system Z^ is of finite type. 

We then give a construction of a family of finite sets F such that Ijp — ljp C 
Z^ -I- F, and a method to minimize the size of the sets F we built. We obtain 
in this way a finite transducer that performs the decomposition of the result of 
the formal substraction Z^ — Z^ into a sum belonging to Z,g -|- F. 

2 Preliminaries 

Let A be a finite alphabet. A concatenation of letters of A is called a word. The 
set A* of all finite words equipped with the empty word e and the operation 
of concatenation is a free monoid. We denote by the word obtained by con- 
catenating k letters a. The length of a word w = wqWi ■ ■ ■ w„-i is denoted by 
|w| = n. One considers also infinite words v = VqV\V 2 • • ■ ■ The set of infinite 
words on A is denoted by A^. An infinite word v is said to be eventually periodic 
if it is of the form v = wz^ , where w and z are in A* and z“ = zzz • • • . A factor 
of a finite or infinite word w is a finite word v such that w = uvz ; if u = e, the 
word u is a prefix of w. A prefix of w is strict if it is not equal to w. 

Definitions and results on numeration systems can be found in [10, Chapter 
7]. Let /? > 1 be a real number. Any positive real number x can be represented 
in base (3 by the following greedy algorithm [14]. Denote by [.J and by {.} the 
integral part and the fractional part of a number. There exists fc G Z such that 
(3^ < X < Let Xk = \x/ j3^\ and = {x/ j3^}. For i < k, put Xi = 
and Xi = {/3ri_|_i}. Then x = Xk(3^ + Xk-i(3^~^ -!-•••. If a; < 1, we get fc < 0 
and we put xq = X-i = • • • = Xk+i = 0. The sequence (xi)k>i>-oo is called the 
j3-expansion of x, and is denoted by 



{x)p = XkXk-i ■ ■ ■ xixo ■ a;_ix _2 • • • 

most significant digit first. The part after the “decimal” point is 

called the (3-fractional part of x. 

The digits Xi are elements of the canonical alphabet A^ = {0, . . . , [/3J} if 
/3 ^ N and Ap = {0, ... ,(3—1} otherwise. When a /3-expansion ends in infinitely 
many zeroes, it is said to be finite, and the O’s are omitted. 

A finite or infinite word w on A/} which is the /3-expansion of some number 
X is said to be admissible. Leading O’s are allowed. 

The set Z^ of (3-integers is the set of real numbers x such that the /3-fractional 
part of I a; I is equal to 0, 

Z^ = {x G M I {\x\)f3 = Xfc • • • xo} = Z[^ U Z[^ 

where Z[j is the set of non-negative beta-integers, and Z[^ = — Z[j . 
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Denote by Dp the set of /3-expansions of numbers of [0, 1) and the shift by a. 
Then Dp is shift-invariant. Let Sp be its closure in The set Sp is a symbolic 
dynamical system, called the P-shift. The set is equal to the set of finite 
factors of Sp. 

There is a peculiar representation of the number 1 which plays an important 
role in the theory. It is denoted by dp{l), and computed by the following pro- 
cess [14]. Let the P-transform be defined on [0, 1] by Tp{x) = Px mod 1. Then 
dp{l) = where ti = [/3T^“^(1)J. Note that [/3J = ti. We recall a result 

of Parry [13]: a sequence s of natural integers is an element of Dp if and only 
if for every p > 1, cr^(s) is strictly less in the lexicographic order than dp{l) if 
dp{l) is infinite, or less than d^(l) = (ti • ■ ■ — 1))‘^ if dp{l) = ti ■ ■ • 

is finite. 

A word wi - ■ ■ Wn of is said to be a minimal forbidden word for Sp if it 
is not a factor of Sp and if • • • Wn-i and W 2 ■ ■ ■ Wn are factors of Sp. Recall 
that a symbolic dynamical system is said to be of finite type if the set of its 
minimal forbidden words is finite. More generally it is said to be sofic if the set 
of its finite factors is recognized by a finite automaton. The /3-shift is sofic if and 
only if dp{l) is eventually periodic, and it is of finite type if and only if dp{l) is 
finite. By abuse we say that the set Zp of /3-integers is of finite type (resp. sofic) 
if dp{l) is finite (resp. infinite eventually periodic). Recall that if /3 is a Pisot 
number, then dp{l) is finite or eventually periodic [2,15]. 

A set X C K" is uniformly discrete if there exists a positive real r such that 
for any x G K", the open ball of center x and radius r contains at most one 
point of A. If y C X and X is uniformly discrete, then Y is uniformly discrete. 
A set A C K" is relatively dense if there exists a positive real R such that for 
any x G M", the open ball of center x and radius R contains at least one point of 
A. If A C y and A is relatively dense, then Y is relatively dense. A set A is a 
Delaunay set if it is uniformly discrete and relatively dense. A set A is a Meyer 
set if it is a Delaunay set and there exists a finite set F such that A — A C A-l-F. 
Lagarias proved [8] that a set A is a Meyer set if and only if both A and A — A 
are Delaunay sets. Note that when A is a Delaunay set, then A — A is relatively 
dense, but not necessarily uniformly discrete. For example A = {n-|- ^ 

Delaunay set and A — A has 1 as point of accumulation. 

Proposition 1. [3] If P is a Pisot number, then the set Zp of P-integers is a 
Meyer set. 

3 Automata for Formal Addition and Substruction 

In this section we construct automata that symbolically describe the elements 
of Zp — Zp when /3 is a Pisot number. Note that 

Zp-Zp = (Z+ - Z+) U (Z+ + Z+) U -(Z+ + Z+). (1) 

The reader is referred to [4] and [16] for definitions and results in automata 
theory. We introduce some notations. Denote by C A*p the set of /3-expansions 
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of elements of with possible leading O’s. Set k = —k, where k is an integer, 
and let A/^ = {[P\, .. . , 1, 0}. We denote by C Ap* the set {w = wn • • • wq I 
w = Wn ■ ■ -wq = {—x)i3, X G Z^}. 

When d/s{l) is finite or eventually periodic, the set is recognizable by a 
finite automaton [5], of which we recall the construction. If d/ 3 (l) = • • - fm is 

finite, the automaton A^+ recognizing has m states qi, , Qm- For each 
1 < t < TO — 1 there is an edge between qi and labelled by ti. For each 

1 < i < TO there are U edges between qi and q\ labelled by 0, . . . , — 1. The 

initial state is qi, every state is terminal. 

If d/ 3 (l) = t\ • ■ ■ ■ ■ ■ tm+p)^ is infinite eventually periodic, the au- 

tomaton A.^+ recognizing has m + p states qi, ... , qm+p- For each 1 < 
i < m + p — 1 there is an edge between qi and qi+i labelled by ti. For each 

1 < t < TO -I- p there are ti edges between qi and qi labelled by 0, . . . ,ti ~ 1- 

There is an edge from qm+p to qm+i labelled by tm+p- The initial state is qi; 
every state is terminal. 

Clearly the set is recognizable by the same automaton as but with 
negative labels on edges. Then the automaton for Z/j is Azf, = A^+ U Aj^- . 

By a general construction one can compute the “sum” of two automata. Let 
A and B be two finite automata with labels in an alphabet of integers. One 
constructs a finite automaton S as follows : 

— the set of states of S is the cartesian product Qs = Qa x Qb 

— there is an edge in S from (p,q) to {p',q') labelled by a -I- 5 if and only if 
there is an edge from p to p' labelled by a in ^ and an edge from q to q' 
labelled by b in B. 

— the set of initial (resp. terminal) states is the cartesian product of the sets 
of initial (resp. terminal) states of A and B. 

Clearly the automaton S recognizes the set {sn • • • sq | IV > 0, Si = ai + bi, 0 < 
i < N, qn ■ ■ ■ 0,0 is recognized by A and is recognized by B}. 

The formal addition of elements of Z)^ consists in adding elements without 
carry. More precisely, 

~ {{oN + bN) • • • (flo + ^o) I On ■ ■ ■ oq, bN ■■ - bo & Z)j } c {0, • • • , 2[/3J }*. 

Similarly the formal subtraction of elements of Z)^ is defined by 

~ {(.ON—bN) ■ ■ ■ {oQ — bo) I Oat • • • oo, 6 at • • • C {— [/3J, • • • , 

From the construction of the sum automaton follows 

Proposition 2. If dp{l) is finite or eventually periodic, the set L~^ + corre- 
sponding to the formal addition and the set L~^ — corresponding to 

the formal subtraction are recognizable by a finite automaton. 

By Equation (1), = ^z+-i-z+ “^z+-z+ -^-(z+-i-z+)- automata 

given by this construction are generally not minimal. 
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Example 1. In the case where [3 = , c^/3(l) = H d^(l) = (10)“. We give 

below the minimal automata Aj + , Aj ^- , Aj+_^j + , and A.^+_j+ ■ Initial states are 
indicated by an incoming arrow, and every state is terminal. 




It is an interesting question to see what is the result of formal addition or 
subtraction when the system is of finite type. First recall that, from the result 
of Parry cited in Sect. 2, if d/ 3 (l) = t\ - ■ ■ tm, the set of minimal forbidden words 
for is the set Ip = {ti - ■ ■ tm} U {tit 2 ■ ■ • tp-iXp \ tp < Xp < ti, 2 < p < 
TO, Xp G Ap}. 

Proposition 3. If dp{l) = ti ■ ■ ■ tm is finite, the formal subtraction Z)j — Z)^ 
defines a system of finite type. 

Proof. Recall that if a word is admissible, any word with smaller nonnegative 
digits is admissible as well. Thus the set of forbidden words for the formal sub- 
traction Z)j — Z)^ is equal to {w, wJ | re G Ip}, which is finite. □ 

The result for formal addition is quite different. 

Proposition 4. If dp{l) = t\ - ■ - tm is finite, the formal addition Z)^ -|- Z)^ de- 
fines a system of finite type if and only if tm = t\ and, for each 2 < i < m — 1, 
ti = ti or ti = 0. 

Corollary 1. If (3 < 2 and dp{l) is finite then the formal addition -\- ifjf 
defines a system of finite type. 

The proof of Proposition 4 follows from several technical results. 

Lemma 1. Suppose that dp{l) = ti - ■ ■ tm, and that there exists 2 < j <m with 
0 < tj < ti (so ti >2), and U = 0 or ti = t\ for 2 < i < j — 1. Then the set of 
minimal forbidden words in the formal addition is infinite. 
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Proof. For any fc > 1 consider the word = [{t\ + t2)(^2 + h) ■ ■ ■ {tj -2 + 

+ tj ~ l)(^j ~ 1 + + t2){t2 + ts) ■ ■ ■ {tm-1 + tra)- 

Let = (2ti — First we show that is forbidden in the formal 

addition system. This comes from the fact that is necessarily the digit- 
sum of the two words = (ti — l)[ti • • • • • • tm-i and = 

ti[t 2 ■ ■ ■ — l)ti]*t2 ■ ■ - tm- Clearly is not admissible for because it 

ends in the forbidden word ti • • • tm, and is admissible for Z^ and maximal 
in the sense that adding 1 to one of its digits makes the word not admissible. 

Note that all strict prefixes of y^^^ are admissible for Z^, so all strict prefixes 
of are also admissible. 

Now we show that the word is admissible in the formal addition system. 
By hypothesis the digits {ti + U+i) for 1 < f < j — 2 are equal to 2ti, t\ or 
0. So can be obtained as the digit-sum of and with the following 
method: a digit 2ti of gives a digit t\ in and a digit t\ in a digit 
t\ of gives a digit ti — 1 in and a digit 1 in z^^^; a digit 0 of gives 
a digit 0 in v^^'> and in Since 0 < < ti, the digits tj-i + tj — 1 and 

tj — \ + ti are < 2fi — 2, which is the sum of ti — 1 and t\ — 1. The suffix 

{tj -1-1- -I- t 2 ){t 2 + h)- - ■ {tm-i + tm) of is thus the digit-sum of 

atit 2 ■ ■ ■ tm-i, with a < ti — 1, and of bt 2 ts ■ ■ ■ tm, with 6 < ti — 1. Hence is 

the digit-sum of and which are both admissible for Z^. □ 



Lemma 2. If d/3(l) = t\ ■ ■ ■ tm is finite and if tm = t\ and, for each 2 < i < 
m — 1, ti = ti or ti = 0 then the formal addition is a system of finite type. 

Proof. As in Lemma 1 we consider the word with tj = t\ for a fixed j, 
2 < j < m. The difference with Lemma 1 is that now the suffix s = (tj — 1 -I- 
ti){t\ + t 2 ){t 2 +ts) ■ ■ ■ {tm -1 + tm) is uot admissible. Since tj = t\, s can be the 
the digit-sum of {t\ — l)ti • • • tm-i and t\t 2 ■ ■ ■ tm, or of {t\ — l)ti • • • {ti-i){ti + 
l)ti+i ■ ■ -tm-i and tit2 ■ ■ ■ U-i{ti - I)t^+1 • • • tm if 7^ 0, for 2 < £ < TO - 1. 
But none of the factors ti • • • ti-\{tj + 1) is admissible for Z^. By considering all 
the positions 2 < j < to in we see that it is not possible to construct an 
infinite family of minimal forbidden words of type □ 

4 A Family of Finite Sets F 

When /3 is a Pisot number, the set of beta-integers Z^ is a Meyer set so there 
exists a finite set F such that Z/j — Z/3 C Z^j -|- F. Our goal is to construct sets 
F as small as possible for Z^. 

Remark 1. Note that there exist several sets F with minimal cardinality. For 
example when P = {l+^/f>)/2 then Z/3— Z^ C 'Lp+F, with F = {0, /3— 1, —P+1}, 
or F = {Q,P- 2, -P + 2}or F = {0, P-l,~P + 2}. 



We first define finite sets from which can be extracted the finite sets F. 
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Lemma 3. Let P be a Pisot number of degree d, let I G M. be an interval of 
length 1 and let U be the following set 

[/ = |x G Z[/3] \ X e I and y2 < j < d, \ < ^ | . 

where x'^'^\ . . . ,x^‘^'> are the algebraic conjugates ofx. Then U is finite, and there 
exists a subset F of U such that Z/j — C + F. 

Proof. As the maximal distance between two consecutive points of Z^ is equal 
to 1, one can find a set F such that Z^ — Z^ C Z^ + in any interval / of length 
1 . 

Fix an interval I of length 1 and F C I as small as possible such that 
Z /3 — Z /3 C Z /3 + F. Let X £ F, then x G (Z^ — Zp) — Zp and can be written as 

N N 

X = '^{oi - b,)P" with loil, \b,\, \ci\ < [P\. 



y‘2<j<d x^^'> ='^{a^-bi - with |oi - - Ci| < 3[/3J. 

i=0 

As /? is a Pisot number, for all j > 2, < 1 and | < (1 ~ 

We obtain in this way the announced bound on the moduli of the 
conjugates of x and x £ U. So F is a subset of U. 

As it contains only points of Z[/3] with bounded modulus and whose all 
conjugates have bounded modulus, the set U is finite. Thus F is a finite set. □ 

The choice of any interval I c] — 1, 1[ of length 1 allows us to reduce the 
cardinality of the set containing a set F. 

Lemma 4. Let P be a Pisot number of degree d, let / c] — 1, 1[ &e an interval 
of length 1 and let U' be the following finite set 

[/' = G Z[P] \ X £ L and y2 < j < d, ^ | ’ 

Then there exists a subset F of U' such that Zp — Zp C Zp + F. 

Proof. We choose here / c] — 1, 1[ of length 1 and improve the bound on the 
moduli of the conjugates of x given in Lemma 3 by considering the decomposition 

Zp-Zp = (Z+ - Z+) U (Z+ + Z+) U -(Z+ + Z+). 

More precisely let x G F C I, then x € (Z /3 — Z/ 3 ) — and can be written 

N N 

X = - bi)p" - y] ap\ 

i—0 i—Q 



as 
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We study \ai — bi — Ci\ according to the signs of Ui, bi and Ci. In — Z^, the 
coefficients satisfy \ai — bi\ < [P\. Moreover when F C] — 1, 1[, Z^ + Z^ C Z^ + F 
and — ^Z^ + ^/3 ) C Z^ + F, then we have \ai — Ci\ < [f3\. So when F c] — 1, 1 [, 
we get in all cases \ai — bi — Ci\ < 2[l3\ . Thus 

N 

V2 < j < d = '^{a^ - bi - with \ai - - a\ <2[(3\, 

i^O 

and the announced bound on the moduli of the conjugates of x holds true. □ 

Example 2. Let /3 be a quadratic Pisot unit, then the set U' contains 5 points. 

5 A First Reduction of the Cardinality of the Sets 
Containing F 

In order to reduce the size of the sets containing F we study the properties of 
the elements of F. 

Lemma 5. Let P be a Pisot number and let F C (Z/j — Z^) — Z^. If f £ F 
there exist a nonnegative integer N , and two finite words bpf ■ ■ - bo and gn • • • uo 
respectively admissible for Z^ — Z^ and such that 

fo = f, V0<i<iV and /at+i = 0. 

Proof. An element f in F can be written as / = ~ with x = 

J2^=o ^ ■ ■ ■ ao being admissible for Zp, and y = ^ Z/3— ^/3> 

bff ■ ■ - bo being admissible for Zp — Zp. Note that leading O’s are allowed. 

With these notations we get for all 0 < i < N, fi = J2^=oi^j+i ~ 
and /tv +1 = 0. □ 

Let P = |x G Z[P] I |x| < and 2 < j < d, < il\pu) \ K is a 
finite set, with the following property that for all / G {{Zp — Zp) — Zp) fl C/', 
the elements fo, ■ ■ ■ , fN of any sequence associated with / according to Lemma 
5 belong to V. Indeed, from Lemmas 4 and 5, when F C U', for all i, \bi — ai\ < 
2[Pj. So for 0 < i < N and 2 < j < d, the conjugates of fi satisfy 
l/P^I < 2[/3J/(1 — |/3i-^i|). Moreover the smallest C such that |a;| < C implies 
Kx - (6 - a))//3| < C is C = 2L/3J/(/3 - 1). 

Following [7], we define a directed graph G whose set of vertices is the set V 

and having an edge x y labelled by (6, a) if y = {x — {b — a)) / fi. 

Lemma 6. Let F C U' be a minimal set satisfying Zp — Zp C Zp + F. Let Vq 
be the subset ofV of vertices connected to 0 in G. Then F cVq. 
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From each vertex f of G which is in U' we look for a path from / to 0 in 
G which is successful in x Note that in G words are processed 

least significant digit first, contrarily to the automata for and Z^ — where 
words are processed most significant digit first (i.e. from left to right). So we first 
define an automaton Qf having as underlying transition graph G with reversed 
edges, 0 as initial state and / as terminal state. We then compute the intersection 
automaton Z/ = (.4z^_z^ x -4z^) H t//. The following result then holds true. 

Proposition 5. An element f of U' is in Vq if and only if the language recog- 
nized by If is nonempty. 

Remark 2. The number of states of the automaton If constructed above is 
O X |y|) where K is the number of states of A^+ and |tx| is the number of 
vertices of G. 

6 Minimization of the Cardinality of the Set F 

The finite sets U' fl Vq obtained by the previous construction are not minimal. 
An element j/ G Z^ — Z/j can be close to two different points of Z^, for example 
such that X <y < x' with x, x' G Z^ and y = x-\- f = x' f with /, /' G C/'flVo. 

Theorem 1. A minimal set F C U' C\Vq can he computed by an algorithm 
exponential in time and space. It consists in building a transducer which rewrites 
a representation of an element ofl^p — into its representation in Zp + F. 

Proof. To find a minimal set F G U' HVo we proceed in two steps. 

First for each / G C/' fl Vq, we define a deterministic automaton Af that 
recognizes the set of admissible words for Z^ — Z^ that appear as the first 
component of the labels of the successful paths in 1^^. The automaton -4/ is 
obtained by erasing the second component of the labels (that belongs to Z/j) 
of the edges of If and determinizing the automaton defined in this way. The 
determinization of automata is based on the so-called subset construction (see 
[4]), which is exponential in space, and the automaton Af has states. 

Next we look amongst all subsets of U' fl Vq for the smallest set F such that 
the language recognized by Uf^pAf contains an admissible representation of 
each element of Z^ — Z^. To test the inclusion, we compute the complement Cp 
of Uf^pAf. Then the language recognized by Uf^pAf contains an admissible 
representation of each element of Z^ — Z^ if and only if the intersection of Cf and 
Aii 3 -Zp is empty. Note that the complexity of the search amongst all subsets of 
17' n Vq is exponential in time. 

From the set F obtained above, we define a transducer that provides, given 
y = ^ ^/3 ~ ^/3 where bp[ . . .bo is admissible for Z^ — Z^, a de- 

composition {ap[...ao,f) where ap[...ao is admissible for Z^, f G F and 

y = Efco + /■ 

Consider the intersection automaton Ip = {Az^-Zp x Azp) AGp {F is the 
set of terminal states of Gp). For any element y admissible for Z^ — Z/j there 
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exists f € F such that y is the first component of the label of a successful path w 
ending in (s, /) where s is any state of (Azfi-Zp) (by construction all states 

are terminal) . Consequently we get y = x + f where x is the second component 
of the label of the same path w and so is admissible for Zp. 

More generally the first component of the labels of the edges in Ip can be 
interpreted as the inputs admissible for Zp — Zp of the transducer, the second 
component as the corresponding outputs admissible for Zp. The associated el- 
ement of F is given by the first component of the label of the state where the 
path ends. □ 

To conclude, the method used here for determining minimal sets F could be 
generalized to more general Meyer sets related with integral matrices having j3 
as spectral radius. 
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Abstract. A regular language L is called dense if the fraction of 
words of length m over some fixed signature that are contained in L tends 
to one if m tends to infinity. We present an algorithm that computes the 
number of accumulation points of (/m) in polynomial time, if the regular 
language L is given by a finite deterministic automaton, and can then 
also efficiently check whether L is dense. Deciding whether the least accu- 
mulation point of (/m) is greater than a given rational number, however, 
is coNP-complete. If the regular language is given by a non- deterministic 
automaton, checking whether L is dense becomes PSPAGE-hard. We will 
formulate these problems as convergence problems of partially observable 
Markov chains, and reduce them to combinatorial problems for periodic 
sequences of rational numbers. 



1 Introduction 

In computational logics, the complexity of almost-sure validity became a funda- 
mental question to logical formalisms, besides e.g. the complexity of membership 
test and validity. Grandjean [9] showed that almost-sure validity for first-order 
logic in the finite is PSPAGE-complete, whereas validity in the finite is undecid- 
able, by Trakhtenbrot’s theorem. 

If we are considering formal languages, the corresponding concept to almost- 
sure validity is the limit behavior of the density of a language. The density of 
a language L over the alphabet U is the sequence {fm)m=o fractions of 

words of length m in the language, fm =df ^ ■ The density of regular lan- 

guages has already been studied in [2] , and the methodology to analyze it using 
formal power series is standard by now [11]. It is known that {fm)m has finitely 
many rational accumulation points [2]. However, to the best of our knowledge 
the algorithmic complexity of e.g. computing the number of accumulation points 
of (/m) has not yet been discussed. We show that the computation of liminf fm 
is coNP-hard, whereas limfm can be computed in time 0{n^), if n is the size of 
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Fig. 1. Periodic and reducible Markov chain where the probability that the system is 
in the set {2, 5, 7} converges to 



the deterministic automaton accepting L. If the language is given by a nonde- 
terministic automaton, the problem to decide whether the accepted language is 
dense becomes PSPACE-hard. 

Partially observable Markov chains. The density problem for a deterministic 
automaton can be translated into a convergence problem for Markov chains. 
We view fm as the probability that a word of length m, chosen uniformly at 
random, leads to an accepting state in the given finite deterministic automaton. 
The automaton can then be considered as the state space of a finite Markov 
chain, having transitions with probability for every labeled edge in the finite 
deterministic automaton. The accepting states of the automaton are the so-called 
set of observable states in the Markov chain. We are interested in the probability 
that the system is in the far future in this observable set of states. 

The specification and verification of long-run average properties of proba- 
bilistic systems was recently studied by de Alfaro [6]. De Alfaro also presents 
efficient algorithms for model checking these long-time average properties us- 
ing stable-state distributions of Markov chains. However, in general the Markov 
chain for the automaton is not aperiodic, and we are not interested in the aver- 
age behaviour, but rather in the probability of a property of the system at some 
specific time point in the far future. There might be a limit probability, even 
though there is no stable-state distribution (see Fig. 1). 

We show that computing the minimal or maximal accumulation point of 
ifm) is coNP-complete. This means that we cannot expect to find an efficient 
algorithm that computes the minimum probability that a system has a certain 
property after a long run. However, it is possible to compute all the accumulation 
points in time polynomial in their number. In particular, if a property has a limit 
probability, i.e. if there is only one accumulation point, we will present an efficient 
algorithm to determine its value. Another problem will be the computation of 
the number of accumulation points. 

Periodic Sequences of Rational Numbers. We will reduce the probabilistic prob- 
lems above to equivalent combinatorial problems for periodic sequences. A peri- 
odic sequence over some field X is an infinite sequence (a[m])“^o of elements in 
X such that there is an integer p > 0 so that a[m\ = a[m-\-p] for all m > 0. The 
least such integer is called the period |a| of a. If we add two sequences a and j3 
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1 1 3 0 2 2. . . 

+ 2 44 6 63 3557. . . 

+642314531253420... 

=999999999999999... 



Fig. 2. A visualization of a sum of periodic sequences with period one. 



componentwise, the result is obviously again a periodic sequence, and the period 
is at most lcm(|a;|, |/3|). But sometimes a set of periodic sequences adds up to a 
sequence with a shorter period. Consider for example the sequences in Figure 2. 
The largest possible length of the sum of the sequences is lcm(6, 10, 15) = 30, but 
in fact their sum has period one. We will investigate how to compute the sum 
of a set of periodic sequences without evaluating a possibly exponential number 
of entries in the sum. 



2 Preliminaries 

The long run behavior of the density of a deterministic finite automaton can be 
seen as a probabilistic process. If the regular language is given by a nondetermin- 
istic automaton, we first have to determinize the automaton, which might lead 
to an exponential blow-up of the size of the automaton. In fact, in this case the 
problem whether the language has a limit density becomes hard for PSPACE. 

Proposition 1. The problem to decide for a given nondeterministic finite au- 
tomaton whether it accepts a dense language is PSPACE-hard. 

Proof, (sketch) We can adapt the classical proof showing that the non- 
universality problem for nondeterministic finite automata is PSPACE-hard [1]: 
Let M be a Turing Machine M accepting a language in polynomial space and let 
X an input. We can encode machine configurations (i.e., tape contents, state and 
head position) as words w, and computations of M as words #wi# • • • ffwkff- 
We can construct a (nondeterministic) automaton of size polynomial in M and 
X that rejects exactly the words ffwiff ■ ■ ■ ffwkffv where Wi, . . . ,Wk represents 
an accepting computation of M on a; and v is any word. 

Clearly if there is such an accepting computation of M on x and n is the 
length of its representation, then the density of the language accepted by the 
automaton is at most 1 — 1/n < 1. Conversely, this language is dense (in fact, 
universal) if x is not accepted by M . □ 

To deal with the deterministic case, we recall in this section some notions 
common in the Markov chain literature. 



Definition 1. A partially-observable Markov chain (POM) can he described by 
a tuple {V, A, sq, O) consisting of a finite set V of states and a function A :V^ ^ 
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[0, 1] specifying transition probabilities^, i.e. we have that ^ 

all V € V. The \V\- dimensional vector Sq is called the initial distribution. The 
set O CV denotes the set of observable states. 

If we identify V with {l,...,|y|}, the transition function A can be seen as a 
\V\ X |y|-matrix of rational numbers. This matrix A = (aij)ij(^v determines a 
directed weighted quasi graph, the transition graph, where there is an edge from 
V to u if Ouv 0. We will freely use graph theoretic notions, and call a POM 
strongly connected (or irreducible), if its transition graph is. Strongly connected 
components of the transition graph with no outgoing edges are called terminal 
components. For simplicity we will identify a POM with its transition matrix or 
its transition graph, if initial distribution and labeling are clear from the context. 

The periodicity of a strongly connected component in a POM is the greatest 
common divisor of the length of all the cycles in the underlying transition graph. 
The periodicity of a POM is the least common multiple of the periodicities of 
its terminal components. A POM is called aperiodic if its periodicity is one. An 
aperiodic and irreducible POM is called ergodic. We can draw POMs as graphs 
like in Figure 1, where we can see a transition graph of periodicity four. 

A distribution s is a |P|-dimensional vector of numbers from [0, 1]. We denote 
the i-th component of this vector by (s)^. A run of a POM is an (infinite) 
sequence {sm)m=o of distributions, where sq is the initial distribution, and Si is 
defined to be Asi-i, for i > 1. A stable-state distribution s is a distribution such 
that As = s. 

For POMs we are not interested in stable-state distributions, but in the long- 
run behaviour of the sequence of probabilities {fm)m=o system is in the 

set of observable states, where fm '■= Tfiose are the problems for 

POMs we are investigating: 

1. Check whether (fm) converges. 

2. Determine the minimal accumulation point of {fm)- 

3. Determine the number of accumulation points of (fm)- 

4. Determine the accumulation points of (fm)- 

Convergence of aperiodic and irreducible Markov chains reduces to finding a 
stable state distribution of the Markov chain (see e.g. [3]): 

Theorem 1 (Basic Limit Theorem). Let A be an aperiodic and irreducible 
POM. Then limm->.oo A'^s exists for all initial distributions s, and is independent 
of s. 

Moreover, we can efficiently find this limit distribution by finding the eigenvector 
to the eigenvalue 1 of A, i.e. solving a linear equation system. Since the POMs 

^ Note that we assume that the transition probabilities are real numbers. When an- 
alyzing the running time of algorithms dealing with POMs we will only count the 
number of additions and multiplications of field elements that we have to perform. 
For the application to densities of regular languages it suffices to represent the prob- 
abilities by rational numbers, and thus we will separately mention how to deal with 
rational numbers. 
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Compute-Limit(_B, s) 

1 Compute terminal components Di, . . . ,Dq. 

2 for V ■. V ^ Bi\J ■ ■ ■ yj Bq 

3 do reduce self loops at v 

4 reduce edges to v. 

5 for i = Q, . . . ,q 

6 do \i <— 

7 bi ^ Eigenvector to eigenvalue 1 of 

8 the transition matrix of Bi. 

9 return the vector \jbi . 



Fig. 3. Computing the density of an aperiodic POM. This procedure is used by the 
algorithm in Figure 4. The sub-procedures reduce solf loops and reduce edges, and 
correctness proofs can be found at [4]. 



COMPUTE-PERIOD(yl, s) 

1 Compute subchains Ai, . . . , Aj, 

2 induced by the terminal components 

3 of periodicity h, ... ,lp. 

4 for i = 0, . . . , p; j = 0, . . . ,h - 1 

5 do ^ 5]^gQ(COMPUTE-LlMlT(yl'’,yl^s))„. 

6 return a\, . . . , ap. 



Fig. 4. The reduction of the convergence problems to period problems, calling the 
procedure Compute-Limit of Figure 3. Periodicities of directed graphs are easy to 
compute (see, e.g., [10]). 



considered in this paper are in general neither aperiodic nor strongly connected, 
we cannot apply this theorem directly. 



3 Reducing the Problem 

In this section show how to reduce the convergence problems mentioned in Sec- 
tion 2 to combinatorial problems of periodic sequences. The facts of this section 
are all essentially known [7], but we state them to emphasize their algorithmic 
aspects. 

Let (V,A,s,0) be a POM, and suppose we are interested in the sequence 
of probabilities fm that the system at time point m is in the set of observed 
states O. Only states in terminal components can contribute to the value of the 
accumulation points of (see [7]), since the probability that the POM is in 
any other state converges to zero. 

The main idea is to analyze each terminal component separately, computing 
the periodic contribution of every terminal component to the probabilities fm- 
To this end we introduce the notion of a subchain of a POM: Given a set of 
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states S, we replace all the outgoing edges of states that do not have a path into 
S' by a self- loop. 

Definition 2. Let (V, A, s, O) be a POM, and S C V be a set of states. Then we 
will define the subchain (V,B,s,0) of A induced by S. The transition function 
of B is defined for all u,v € V by 

{ A{u, v) if there is a path in A from u into S 
1 if u = V, and there is no path in A from u into S 

0 otherwise 

Obviously, an induced subchain is a POMas well. Let li, . . . ,lp he the periodic- 
ity of the subchains Ai, . . . ,Ap induced by the terminal components. Thus the 
periodicity of the POM is I lcm(^i, . . . , Ip). 

Now for each of these subchains Ai and for 0 < j < k, we define 

a^[j]:= lim (1) 

m—^oo ^ ^ 
veO 

As we will see in the next proposition, these limits exist and can be computed. 
The proposition states that the global accumulation points can be computed 
using the periodic contributions of the subchains induced by the terminal com- 
ponents: 

Proposition 2. Let A be a POM, and Ai, . . . ,Ap the subchains induced by the 
terminal components Si,...,Sp. Let h denote the periodicity of Ai, and I := 
lcm(li, . . . , ?p) the periodicity of A. Then for every v € Si and every initial 
distribution s the following limits exists and can be computed, and we have: 

lim = lim (2) 

m—^oo m—>oo 

In particular, lim.m^oo fmi+j = A proof and an algorithm for 

computing the Oj can be found in the full version of the paper at [4] . 

4 Periodic Sequences of Field Elements 

In the previous sections we saw how to compute certain characteristic periodic 
sequences of rational numbers that describe the long run behaviour of a POM 
with respect to the sequence of probabilities (fm) that the system is in an ob- 
served state. If we want to know whether this probability converges, it suffices 
to check whether all accumulation points are equal. In this section we present a 
polynomial time algorithm that avoids to check in a brute-force way exponen- 
tially many different entries in the periodic sequence of the sum. 

A sequence (a[*])“o of elements over some set X is periodic if there is an 
integer p > 0 so that a[m] = a[m + p] for all m > 0. The least such integer 
is called the period |o;| of a. Here we are interested in algorithmic problems for 
periodic sequences over real (or rational) numbers, and thus we assume that X 
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is a field. In Section 2 we asked several questions concerning the convergence of 
POMs. By the reduction of the previous section they correspond to the following 
problems for periodic sequences of integers , . . . , , where a periodic sequence 

a is given by a finite sequence o;[0], . . . , a[|a| — 1] of elements in X: 

Let j3 be the sequence defined by /3[j] := X)i=i 

1. Check whether \j3\ = 1. 

2. Determine the minimal element min{/3[i] | 0 < t < |/3|} of the periodic 
sequence (3. 

3. Determine the length \(3\ of the sequence /?. 

4. Determine the entries of the periodic sequence [3. 

These problems are in fact polynomial time equivalent to the corresponding 
problems for POMs: Assume we are given a set of periodic integer sequences. It 
is then easy to specify a POM and an observation set such that the respective 
convergence problem leads to the corresponding period sum problem. 

For Problem 4, we have to measure the complexity of an algorithm in both n 
as above and m := \ j3\, because in this case the size of the output (3 itself might 
be exponential. The second problem turns out to be hard: 

Proposition 3. The problem to determine whether the minimal element in the 
sum of given periodic sequences is greater or equal than a given value k is coNP- 
complete. 

This can be proven by reduction of the complement of the NP-complete problem 
simultaneous inequalities [8], which stays hard even if the numbers of the instance 
are represented in unary (by inspection of the NP-hardness proof given in [12]). 
A proof can be found in the full version of the paper available at [4] . In the next 
section we will show that there is an efficient algorithm for Problem 1, 3 and 4. 



5 An Efficient Algorithm for Periods over the Rationals 



The main idea of the algorithms for the period sum problems presented here is to 
represent a periodic sequence a as the power series Pa(X) defined by 
1]A“*. Let I := |q;|; it is easy to verify the following closed representation of this 
power series: 



Pc{X) 






l — i — 



A'-l 



1 



Given any fraction of polynomials such that the denominator divides A* — 1, this 
fraction can be expanded to such a representation of a periodic sequence. The 
sequence can then be determined by dividing the numerator by the denominator 
with a polynomial division; this can easily be verified for a denominator — 
1 and must therefore hold for any divisor of — 1, because the result of a 
polynomial division is invariant under expansion and cancelation of the fraction. 

For the period problems we are given the sequences a\, . . . ,ak, and want to 
analyze their sum. Adding up the fractions Pa^{X), . . . ,pa^.{X) yields a fraction 
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representation ^ of the sum of the sequences. We would like to compute the 
potentially exponential period of ^ without actually computing the entries of 
the periodic sequence. For the denominator v, we first compute the least common 
multiple of the denominators of all summands; since these are of the form X* — 1, 
their zeroes are exactly all l-th roots of unity. We can therefore represent v by 
the list of its roots of unity. The entries of the list are stored as fractions ^ such 
that Zj = exp(|i27rf) is the j-th root of unity in the list. For each Zj in the list, 
we test whether u also evaluates to 0 at Zj. If so, Zj can be canceled, and we 
eliminate it from the list. 

We would like to compute the period of the resulting representation, i.e. 
the minimal m such that the fraction has denominator X'^ — 1. Since every 
remaining Zj in the list is an l-th root for every multiple I of qj , it suffices to find 
the minimal m that is a multiple of every qj. Thus the period m is the smallest 
common multiple of the qj. 

If we are given periodic sequences of rational numbers, testing whether the 
numerator is zero at a root of unity can be done numerically. We first calculate 
the greatest common divisor of the numerator and the denominator, which is 
guaranteed to have only roots of unity as zeros. The minimal distance between 
two roots of unity is 27 t times the distance of their representing fraction, which 
is limited by the inverse of the input size, and we can therefore test whether 
the polynomial contains a certain root of unity with a linear number of bits of 
precision. To approximate the values at the roots of unity up to n bits we need 
0(n^ log n) time. 

To actually compute the entries of the sequence sum (Problem 4) we perform 
the division of ^ step by step, and stop after m steps. 

Proposition 4. Let n := |ai| + • • • + \uk\ he the size of a set of periodic in- 
teger sequences, and m := |ai + • • • + a^l the length of their sum. Then the 
problem to calculate m is in O(n^lognloglogn). Computing the entries of the 
sum takes 0(n^lognloglogn + m) operations. In particular, we can check in 
0(n^ log n log log n) whether the sum has period one. 

Proof. The dominating step with respect to the input size n is the reduction 
of the at most n fractions to higher terms before adding them: Assuming an 
n log n log log n multiplication algorithm (see e.g. [5]), the algorithm runs within 
0(n^ log n log log n). For large m, the performance of the division is the bottle- 
neck, requiring m operations. □ 

6 Conclusion 

We reduced the problem to determine the limit behaviour of regular languages 
to convergence problems for partially observable Markov chains. In this more 
general setting we reduced the problem to combinatorial period problems over 
fields that can be solved efficiently using power series representations. It is pos- 
sible to efficiently compute the potentially exponential number of accumulation 
points of the density of a regular language given by a deterministic automaton. 
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Moreover, we presented an algorithm that computes the accumulation points in 
time polynomial in the input and output. The overall running time of the algo- 
rithms for the tractable cases is dominated by the reduction to period problems 
over rational numbers, which involves the solution of a linear equation system. 

If the language is given by a nondeterministic algorithm, we proved that 
the density problem is PSPACE-hard; we do not know whether it is PSPACE- 
complete. We would also like to know the computational complexity of checking 
whether a context free language, given by a generating grammar, is dense, i.e., 
its density converges to one. 



References 

1. A. V. Aho, J. E. Hopcroft, and J. D. Ullman. The Design and analysis of algo- 
rithms. Addison- Wesley, 1974. 

2. J. Berstel. Sur la densite asymptotique de langages formels. ICALP, pages 345-358, 
1972. 

3. R. N. Bhattacharya and E. C. Waymire. Stochastic processes with applications. 
Wiley Series in Probability and Mathematical Statistics, New York, 1990. 

4. M. Bodirsky, T. Gartner, T. von Oertzen, and J. Schwinghammer. Efficiently 
computing the density of regular languages. Full version, available under 

http : //www . informatik.hu-berlin.de/~bodirsky/publications. 

5. P. Biirgisser, M. Clausen, and M. Shokrollahi. Algebraic Complexity Theory. 
Springer Verlag, 1997. 

6. L. de Alfaro. How to specify and verify the long-run average behavior of proba- 
bilistic systems. In Proc. 13th IEEE Symp. on Logic in Computer Science, IEEE 
Computer Society Press, 1998. 

7. F. R. Gantmacher. The Theory of Matrices. Chelsea Pub. Co., 1977. 

8. M. Carey and D. Johnson. A Guide to NP-completeness. CSLI Press, 1978. 

9. E. Grandjean. Complexity of the first-order theory of almost all finite structures. 
Information and Control, 57:180-204, 1983. 

10. K. Mehlhorn and S. Naher. LEDA. A platform for combinatorial and geometric 
computing. Cambridge University Press, Cambridge, 1999. 

11. A. Salomaa and M. Soittola. Automata- Theoretic Aspects of Formal Power Series. 
Springer- Verlag, 1978. 

12. L. J. Stockmeyer and A. Meyer. Word problems requiring exponential time. In 
Proc. 5th Ann. ACM Sypm. on Theory of Computing, number 1-9 in Association 
of Computing Machinery, 1972. 




Longest Repeats with a Block of Don’t Cares 



Maxime Crochemore^’^*, Costas S. Iliopoulos^**, Manal Mohamed^* * *, and 

Marie-France Sagot^^ 

^ Institut Gaspard-Monge, University of Marne-la-Vallee, 

77454 Marne-la-Vallee CEDEX 2, France 
maxime . crochemoreOuniv-mlv . f r 
^ Department of Computer Science, King’s College London 
London WC2R 2LS, England 
mac , csi , manalSdcs . kcl .ac.uk 

® Inria Rhone-Alpes, Laboratoire de Biometrie et Biologie Evolutive, 
Universite Claude Bernard, 69622 Villeurbanne cedex, France 
Marie-France . Sagot@inria.fr 



Abstract. We introduce an algorithm for extracting all longest repeats 
with k don’t cares from a given sequence. Such repeats are composed of 
two parts separated by a block of k don’t care symbols. The algorithm 
uses suffix trees to fulfill this task and relies on the ability to answer the 
lowest common ancestor queries in constant time. It requires O(nlogn) 
time in the worst-case. 

Keywords: Combinatorial Problems, String, Repeat Extraction, Don’t 
Care, Sufhx Tree, Lowest Common Ancestor, Efhcient Merging. 



1 Introduction 

In recent years, many combinatorial problems that originate in bioinformatics 
have been studied. Here we consider a combinatorial problem on motifs. The 
term motif [5] is often used in biology to describe similar functional components 
that several biological sequences have in common. It can also be used to describe 
any collection of similar substrings of a longer sequence. In nature, many motifs 
are composite, i.e. they are composed of conserved parts separated by random 
regions of variable lengths. 

In this paper we explore a sub-problem that is important in the approach 
to the combinatorics and the complexity of the original biologically motivated 
topic. Thus, we concentrate on finding all longest repeats with a block of k don’t 
cares. Such repeats consist of two exact parts separated by a gap of fixed length 
k. Hence, our aim is to find all such repeats and their positions in the string. 
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A closely related problem was studied by Brodal et al. [2]. They developed 
algorithms for finding all “maximal pairs with bounded gap” . This notion refers 
to a non extendable substring having two occurrences within a bounded dis- 
tance of each other. A restricted version of the same problem was considered by 
Kolpakov and Kucherov [7]. They proposed an algorithm for a fixed gap. The 
problem of finding longest repeats with no don’t cares is a mere application of 
suffix trees [5]. 

In our method we use two suffix trees intensively, one for the original string 
and the other for its reverse. The use of a generalized suffix tree (for both the 
string and its reverse) would be possible but is not necessary because we do not 
need all the information it contains. We have not yet explored the possibility of 
using an affix tree [9] but there are some doubt that it will lead to a significant 
improvement on the asymptotic time complexity. 

The paper is organized as follows: in Section 2, we state the preliminaries 
used throughout the paper. In Section 3, we define the longest repeat with k 
don’t cares and describe in general how to find them using two suffix trees. In 
Section 4, we detail our algorithm. Finally in Section 5, we analyze the running 
time of the algorithm. 

2 Preliminaries 

Throughout the paper x denotes a string of length n defined on a finite alphabet 
S. We use x[i], for z = 1, 2, . . . , n, to denote the z-th letter of x, and x[i..j] as 
a notation for the substring a;[z]a;[z -I- 1] • • • x[j] of x. The string denotes the 
reverse of x, such that 1 f[1] = x[n], . . . ,'^[n] = a:[l]. 

The length of a string w is denoted by |zc|. If zu = zzu then w is said to be the 
concatenation of the two strings u and v. The string is the fc-th power of w. 

A symbol ‘o’ ^ A is called a “don’t care”; any other symbol is called solid. 
A don’t care matches any other symbol, that is, o = cr for each a G S U {o}. A 
pattern y over A U {o} is said to occur in x at position i if y[j] = a;[z -I- j — 1], 
for 1 < j < |i/|. A motif w denotes a pattern that occurs at least twice in x. 
We restrict the motifs to have a solid symbol at both ends, i.e., zz;[l] ^ o and 
■u;[|'u;|] ^ o. The set is the set of occurrence positions of a given motif w, 
where Cw = {x[z..z -I- |w| — 1], 1 < z < n — |zz;| -I- 1}. Observe that \Cw\ > 2. 

For a given string x and an integer k, a motif w of the form Lo^ R is called 
repeat with k don’t cares. The substrings L and R, respectively, are the left and 
right parts of w. The length of the longest such repeat in x is denoted by Ir^^x). 
Later on, we use the following notion: a motif w is called left maximal (resp. 
right maximal) if w can not be extended to the left (resp. right) without losing 
one of its occurrences. 

Here we present a method for finding all longest repeats with k contiguous 
don’t cares and their positions. This method uses the suffix tree of a; as a fun- 
damental data structure. A complete description of suffix trees is beyond the 
scope of this paper, and can be found in [5] or [4]. However, for the sake of 
completeness, we will briefly review the notion. 
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Definition 1 (Suffix tree). The sujjix tree T{x) of the string x is the com- 
pacted trie of all suffixes of x$, where % ^ S. Each leaf in T{x) represents a 
suffix x[i..n] of x and is labelled with the index i. We refer to the set of indices 
stored at the leaves of the subtree rooted at node v as the leaf-list of v; it is de- 
noted by LL{v). Each edge in T{x) is labelled with a nonempty substring of x 
such that the path from the root to the leaf labelled with index i spells the suffix 
x[i..n\. We refer to the substring of x spelled by the path from the root to a node 
V as the label of v, and denote it by l^. The length of such a substring is the 
depth of V and we denote it by d^. 

Several algorithms construct the suffix tree 'T{x) in 0(n) time, assuming an 
alphabet of fixed size (see for example [4] [5]). All the internal nodes in T{x) 
have an out-degree between 2 and |i7|. Therefore, we can transform the suffix 
tree into a binary suffix tree B{x) by replacing every node v in 'T(x) with out- 
degree d > 2 by a binary tree with d — 1 internal nodes and d — 2 internal edges, 
where the d leaves are the d children of v. Since T{x) has n leaves, constructing 
the binary suffix tree B(x) requires adding at most n — 2 new nodes. Each new 
node can be added in constant time. This implies that the binary suffix tree B{x) 
can be constructed in 0(n) time. 

Our method makes use of the Schieber and Vishkin [8] Lowest Common 
Ancestor algorithm. For a given rooted tree T, the lowest common ancestor of 
two nodes u and v, lca(u,v), is the deepest node in T that is ancestor of both 
u and V. After a linear-time preprocessing of a rooted tree, the lowest common 
ancestor of any two nodes can be found in constant time. 



3 Longest Repeats with k Don’t Cares 

The longest repeats with k don’t cares problem requires finding all longest 
repeats of the form Lo^ R, that appear in a given string x. In the notation, L and 
R are both over E and represent the left and the right parts, respectively, of the 
repeat. The parameter fc is a given positive integer smaller than n. For example, if 

X = BBAZYABAAAXBBAXZABAZAHIABAA 

then the only longest repeat with 2 don’t cares is w = BBA o oABA and its 
occurrence list is £u;{l,12}. Thus, lr 2 {x) = 8 . An obvious approach to solve 
this problem is as follows: 

1. generate all possible repeated substrings in x; 

2. for each pair of repeated substrings u and v, check whether there exist at 
least two pairs of occurrence positions ii and i 2 of u and ji and j 2 of v such 
that ji = ii -\- |t6| -I- k; 

3. calculate the length of the repeat with k don’t cares v, 

4. report all longest ones. 

This straightforward approach can be improved by dynamic programming 
yielding an O(n^) time algorithm. 
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Fig. 1. The suffix tree of GCCTAXXXGCATA. 



Our approach proceeds differently and results in an O(nlogn) time algo- 
rithm. It starts by constructing the two suffix trees T{x) and The first 

suffix tree is used to generate the right part of the repeat, while the second 
suffix tree generates the left part. Observe that the label of each internal 
node u G T{x) represents a right-maximal repeated substring of x which oc- 
curs at LL{u). Similarly, the label £y for each internal node v G repre- 

sents the reverse of a left-maximal repeated substring of x ending at positions 
{j \j = n+l-i,iG LL{v)}. 

For simplicity, we replace each index i in T(1 f) byn-|-l — t-l-(A:-|-l). Our goal 
now, is to traverse both trees efficiently to find all pairs of nodes u and v where 
u G T{x), V G T(^), \LL{u) n LL{v)\ > 2, and -I- is maximum. For each 
pair u and v, the concatenation of the reverse of the label of v, k don’t cares, 
and the label of u gives a longest repeat with k don’t cares, i.e., w = %, 
Observe that, Ir^ = + k + du- 

For example, ii x = GCCTAXXXGCATA and k = 1, then Fig. 1 and Fig. 2 
represent the suffix trees of, respectively, x and Note that, each index i in 
T(1 f) has been replaced by 16 — i. The node in T{x) labelled by TA and node 
in T(^) labelled by CG both have leaf-list {4, 12}. Thus, GC oTA is a repeat 
with 1 don’t care. Since it is the longest such repeat, lri{x) equals 5. The list of 
occurrence positions of the longest repeat with one don’t care of x is {1,9}. 



4 Algorithm 

The initialization phase of the algorithm consists of two main steps. In the first 
step, the suffix tree of x is constructed and then traversed in a preorder manner 
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Fig. 2. The suffix tree of ATACGXXXATCCG. Each index i is replaced by 16 — i. The gray 
nodes may be omitted. 



where a number is assigned to each node. For each index i, no{i) is the preorder 
number assigned to the leaf node v labelled with z in T (x) . This is done during 
the tree depth-first traversal. For example, if T{x) is the tree of Fig. 1 then 
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In the second step, the suffix tree of x is built. In addition, a list is associated 
with each leaf node v. For each leaf node v labelled with the z-th suffix of 
this list is initialized with the element no{n -1-1 — z-l-(fc— 1)). 

For each internal node v , the list is the sorted union of the disjoint lists of the 
children of v. The computation of the lists for the internal nodes can be done dur- 
ing a depth-first traversal of the tree. However, in order to guarantee an efficient 
merge of the lists associated with the children of a node, T(1 f) is transformed 
into a binary suffix tree B{^). Furthermore, to maintain these lists efficiently, 
these lists were implemented using AVL-trees [1]. Although this implementation 
is similar to the one used in [2] and [6] , any other type of balanced search trees 
may be used. Note that the efficient merging of two AVL trees is essential to 
our method. The results on the merge operations of two height-balanced trees 
stated in [3] are summarized in the following lemmas. 

Lemma 1. Two AVL trees of size at most n and m can he merged in time 

o(iog(”r))- 
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Lemma 2. Given a sorted list of elements ei < 62 < • • • < e„, and an AVL tree 
T of size at most m, where m > n, we can find qt =max{x € T\x < e^} for all 
i = 1,2, . . . ,n in time 0(log 

Proof. The basic idea is to use the merge algorithm of Lemma 1 while keeping 
the positions where the insertions of the elements e* G T take place. This change 
in the merge algorithm does not affect the time complexity and as a result we 
can find all Qi in 0(log time. 

Using the smaller-half trick, which states that “the sum over all nodes v of an 
arbitrary binary tree of terms that are 0{ni), where ni and U 2 are the numbers 
of leaves in the subtrees rooted at the children of v and ni < n^, is O(nlogn)”, 
the following lemma stated in [2] is easy to prove: 

Lemma 3. Let T he an arbitrary binary tree with n leaves. The sum over all 
internal nodes v in T of terms log where n\ and U 2 are the numbers of 

leaves in the subtrees rooted at the two children ofv, is 0(n log n). 

The algorithm for finding all longest repeats with k don’t cares is given in 
Fig. 3. Recall that at every node v in we construct a sorted list, stored 

in an AVL tree A, of all the preorder numbers associated with the elements in 
LL{v). This list can be considered as a leaf-list sorted according to the preorder 
numbers associated to the indices in T{x). If u is a leaf, then A is constructed 
directly (Line 5). If w is an internal node, then A is constructed by merging 
Al and A 2 (Line 27), where Ai and A 2 are the AVL trees associated with the 
two children of v and |Ai| < IA 2 I. Before constructing A, we use Ai and A 2 to 
check for an occurrence of longest repeat with k don’t cares. If a number a in 
Al is going to be inserted between b and c in A 2 , then b and c are efficiently 
obtained (Lemma 2). Let max be the length of the current longest repeat with k 
don’t cares. And let u and v be the nodes representing this longest repeat, where 
u G T{x) and v G B{fr). Since we are moving upward in minimizing the 

depth of v, the only way to find a longer repeat with k don’t cares is by replacing 
node u in T{x) with a node that has greater depth. Clearly, this node should 
be a lowest common ancestor of a pair of nodes that has not been considered so 
far, i.e. a pair consisting of an element in Ai and an element in A 2 . It follows 
from Lemma 4, that we do not need to consider all the possible new pairs. In 
other words, only the pairs of the form (a,b) or (a,c) are the ones that need to 
be considered. For each pair of nodes considered by the algorithm, the algorithm 
checks whether the sum of the depth of both nodes is greater than or equal to 
max. If so, the algorithm uses list M to store the pair. Note that the longest 
repeat with k don’t cares may not be unique. So, each pair (x, y) in M represents 
a longest repeat obtained by a concatenation of k don’t cares, and fx- Where 
lrf^{x) equals dx + k dy for all pairs {x, y) G M. 

Lemma 4. Let i, j and k be the preorder numbers given to three leaves u, v and 
w during a preorder traversal of a rooted tree T. Lf i < j < k, then the depth 
of lca{u,v) cannot he less than the depth of lca{u,w), where lea is the lowest 
common ancestor of two nodes. 
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Algorithm Longest-Repe&t-Don’t-Caresix, k) 

Input: A string x of length n 

Output: All longest repeats with k contiguous don’t cares 

1. Build the suffix tree T(a;) and traverse the tree in preorder manner numbering all 
the nodes. 

for each leaf v € T(x) 

if V is labelled with i 

then no{i) •(— the preorder number of v 

Build the binary suffix tree B(%T) and create at each leaf an AVL tree of size one 
that stores no(n + 1 — i + (fc + 1)), where i is the index associated with the leaf. 
{max,u,v) t— {0,root{T{x)),root{B{'^))) 

M ^0 

for each node v £ in bottom-up (depth-first) manner 

Ai, A 2 <— the AVL trees of the two children of v where |Ai| < IA 2 I 
for a £ Ai in ascending order 

b max{a; G A 2 \ x < a} 
ab G- lca(no~^(a),no~^(b)) in T(x) 
if dab + d-u = max 

then (u,v) <— (ab,v) 

M U (ab, v) 
else if dab + d,, > max 

then (max, u,v) G- (dab + dv,ab,v) 

M <^(ab, v) 

c G- next(T 2 , b) 

ac G- lca(no~^(a),no~^(c)) in 7~(x) 
if dac + dv = max 

then (u,v) G- (ac,v) 

M<—M U (ac, v) 
else if dac + d^ > max 

then (max, u,v) G- (dac + d^ , ac, v) 

M<^(ac, v) 

A G- merge(Ai,A 2 ) 

Ivk <^max + k 
return (lrk,M) 
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Fig. 3. All longest repeats with k don’t cares algorithm. 



Proof. The proof is by contradiction. Let x and y be lca(u,v) and lca(u,w), 
respectively. Assume that the depth of x is less than the depth of y. Since i 
is less than j, k also must be less than j, which contradicts the condition that 
i < j < k. 

The depth of a node in the Lemma 4 is the length of the path from the root 
to this node. It is quite easy to see that the Lemma can be extended to suffix 
trees where the depth of a node is the length of the substring spelled by the path 
from the root to this node. 
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5 Time Complexity 

In this section, we analyze the running time of the algorithm. Recall that, for 
constant size alphabet, a suffix tree can be built in linear time. Thus, Creating 
T{x) and performing the preorder traversal at Line 1 requires 0(n) time. The 
loop on Lines 2-4 takes 0{n) time. Building ) also takes 0{n) time. Creating 
an AVL tree of size one can be done in constant time. Thus, doing so at each of 
the n leaves of B{^) at Line 5 requires total of 0{n) time. Lines 6,7 take 0(1) 
time. 

The algorithm then traverses B{'^) in depth-first manner (Lines 8-27). At 
every internal node v, the algorithm runs a search loop on Lines 10-26 and then 
performs a merge at Line 27. Let A\ and A 2 be the two AVL trees associated 
with the two children of v where |Ai| < IA 2 I. During the search loop (Lines 
10-26), for each a € Ai, the algorithm searches A 2 to find b and c. According 
to Lemma 2, the time required to complete the search loop at each node is 
0(log Additionally, Lemma 1 states that the merge at Line 27 takes 

also 0(log time. Summing these terms over all the internal nodes 

of B{^) gives the total running time of the tree traversal (Lines 8-27), that is 
0{n log n) (Lemma 3) . Thus, the total running time of the algorithm is 0{n log n) 
time. The following theorem states the result. 

Theorem 1. Algorithm Longest-Repeats-Don’t-Cares extracts all longest re- 
peats with k don’t cares from a given string in O(nlogn) time. 
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Abstract. We call a pseudovariety finite join irreducible if 

V < Vi W2 V < Vi or V < V2. 

We present a large class of group mapping semigroups generating finite 
join irreducible pseudovarieties. We show that many naturally occurring 
pseudovarieties are finite join irreducible including: S, DS, CR, CS and 
H, where H is a group pseudovariety containing a non-nilpotent group. 



1 Introduction 

The following results, appearing here for the first time, are part of the authors’ 
forthcoming book, “The q-theory of finite semigroups [11];” further results shall 
appear therein. 

All semigroups in this paper are assumed to be finite. Recall that a pseudova- 
riety of semigroups [4] is a class of semigroups closed under finite direct products, 
subsemigroups and homomorphic images. They play an important role in Formal 
Language Theory thanks to Eilenberg’s Variety Theorem [4], which establishes 
an isomorphism between the complete lattices PV of pseudovarieties of semi- 
groups and RAT of varieties of rational (=regular) languages. Since the join 
operation on RAT corresponds to closing under the Boolean operations, it is 
quite natural, from the Formal Language Theory point-of-view, to want to study 
these lattices. 

We now recall some basic notions from lattice theory [11] in order to state 
our results. Fix a complete lattice L. We say that I € L is: 

(ji) join irreducible if I < Vie/ ^ some / G /; 

* The second author was supported in part by NSERC and by the FCT and POCTI ap- 
proved project POCTI/32817/MAT/2000 in participation with the European Com- 
munity Fund FEDER 
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(sji) strictly join irreducible if I = Vie/ implies I = k, some i G J; 

(fji) finite join irreducible I <l\\/ I 2 I < h, some i; 

(sfji) strictly finite join irreducible if I = hV I 2 I = U, some i. 

The reader is cautioned that sfji is sometimes called join irreducible in the liter- 
ature, while fji is sometimes called co-prime. 

Let S' be a semigroup; then HSPf(S) denotes the pseudovariety generated 
by S. If V G PV, then V = Vsgv (‘^) (and hence ji) 

pseudovariety must be generated by a single semigroup. It is straightforward to 
verify that a one-generated pseudovariety is ji (respectively, sji) if and only if it 
is fji (respectively, sfji). 

One reason that sfji is important is the following: An sfji (and hence fji) 
pseudovariety that is not one-generated cannot have a maximal proper subpseu- 
dovariety. Indeed, if W is a maximal proper subpseudovariety and S' G V \ W, 
then V = W V]HSPf(S) implies V = HSPf(S). On the other hand, one can show 
that HSPf(S) is sfji if and only if it has a unique maximal proper subpseudova- 
riety [11]. We mention that there are naturally occuring pseudovarieties that 
are not sfji, such as the pseudovariety of j7-trivial semigroups [1,11]. Also, for 
example, the pseudovariety of nilpotent semigroups is sfji but not fji [11]. 

Key to this paper is the notion of a Kovacs-Newman semigroup. Roughly 
speaking a Kovacs-Newman semigroup is a semigroup S such that every divi- 
sion from S onto a subdirect product factors through a projection. Each such 
semigroup generates a distinct join irreducible pseudovariety. We mention in 
passing that we do not distinguish between isomorphic semigroups. 

Our main results include the following. Let H be a pseudovariety of groups 
containing a non-nilpotent group. Then H (subgroups in H) is finite join irre- 
ducible as well as its intersections with DS (regular J^-classes are semigroups) 
and CR (union of groups). Hence none of these pseudovarieties of semigroups 
have maximal proper subpseudovarieties. In particular, the pseudovarieties S of 
all semigroups, DS and CR are finite join irreducible. The case of S was first 
obtained by Margolis et. al [10] using profinite methods. They, in fact, showed 
that H is sfji whenever H is closed under semidirect product. Auinger and the 
second author [3,2] have shown that a sufficient condition for a pseudovariety of 
groups H to be fji is the following: For each G G H, there exists a group H so 
that the wreath product H I G belongs to H. This is a relatively mild condition 
and there are uncountably many such pseudovarieties; see [3]. Our methods give 
an elementary proof for a large subclass of such pseudovarieties including the 
pseudovariety of groups G. 

2 The Group Case 

2.1 Preliminaries 



Recall that a non-empty partially ordered set D is said to be directed if any two 
elements of D have an upper bound. 
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Lemma 2.1. Let L he a complete lattice and suppose that D C L is a directed 
subset of fji elements. Then \/ D is fji. 

Proof. Let d = \J D and suppose d < li V I 2 . Suppose d ^ h; then I ^ li for 
some I G D. Since I is fji, I < I 2 . Let V G D and let k G D be an upper bound for 
I and I'. If fc < li, then I < li, a, contradiction. Since k is fji, we conclude that 
k < h- Hence I' < I 2 . Since /' was arbitrary, we conclude d < I 2 , as desired. □ 

We shall also need a variant of this lemma. 

Lemma 2.2. Let Y be a pseudovariety such that, for each S gY, there exists 
S' gY such that: 

1. SGmvtis') 

2. S" G Vi W 2 ^ 5 G Vi or S' G V 2 . 

Then V is fji. 

Proof. Let So, Si, S 2 , . . . be an enumeration of the elements of V with So trivial. 
Set To = So and define, for z > 0, = {Ti_i x Si)'. Then, by (1), we have 

Vz, T, G HSPf(T,+i) and S* G HSPf(T,) (2.1) 

Suppose V < Vi V V 2 and V ^ Vi. Then S^ ^ Vi for some i. Let j > z; then 
Si G HSPf(T,) by (2.1), so T,- i Vi. Since T, G Vi V V 2 , we deduce from (2) 
that Tj_i X Sj G V 2 , j > i. Hence T^ G V 2 for fc > z — 1. We conclude, using 
(2.1), that V < V 2 , as desired. □ 

Let GPV be the lattice of group pseudovarieties. The following is a straight- 
forward consequence of the fact that groups “lift” under surjective homomor- 
phisms [5,4]; the proof is left as an exercise. 

Proposition 2.3. LfV is any of the properties ji, sji, fji or sfji, then H G GPV 
is V in GPV if and only if it is V in PV. 

2.2 Kovacs-Newman Groups 

We refine slightly a result of Kovacs and Newman [6, 7, 8, 9] showing that certain 
pseudovarieties of groups are ji. By Proposition 2.3, we may restrict our attention 
to the group setting. 

Recall that a semigroup S divides a semigroup T if S' is a quotient of a 
subsemigroup of T [5,4] . 

A semigroup T is said to be a subdirect product of Ti and T 2 , written T « 
Ti X T 2 , if the projections from T to the Ti are surjective. A semigroup T is 
called subdirectly indecomposable if T << T 1 XT 2 implies that at least one of the 
projections : T ^ T^ is an isomorphism. It is not hard to see [9] that a group G 
is subdirectly indecomposable if and only if it has a unique minimal (non-trivial) 
normal subgroup M, called its monolith [9]; sometimes G is called monolithic [9]. 
A pseudovariety is always generated by its subdirectly indecomposable members. 
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In what follows, we identify the direct factors Gi and G 2 of Gi x G 2 with 
Gi X 1, 1 X G 2 , respectively. 

We call a non-trivial group G a Kovdcs-Newman group (or KN group for 
short) if it has the following property: Whenever there is a diagram 

G^H « Gi X G 2 (2.2) 

(fi factors through one of the projections. Since it is clear that G G Hi V H 2 if and 
only if there is a diagram as in (2.2) with Gi G Hi and G 2 G H 2 , it follows that 
if G is KN, then EISPf(G) is fji and hence ji. Observe that if G is a KN group 
and G G EISPf(i/), then G divides H. Indeed, G must divide a product of copies 
of H and so, being a KN group, it divides H. In particular, two non-isomorphic 
KN groups cannot generate the same pseudovariety. 

We remark that there are ji pseudovarieties that are not generated by KN 
groups. In fact, we show that if p is a prime, then G = Zp is not a KN group, 
but HSPf(Zp) is ji. To see that HSPf(Zp) is ji, suppose Zp divides Gi x G 2 
with Gi G Hi, i = 1,2. Then p divides |Gi| • IG 2 I and hence \Gi\ for some i. 
Thus Zp is a subgroup of Gi and so Zp G H^; we conclude HSPf(Zp) is ji. The 
following proposition shows that KN groups are centerless (and hence no element 
of ElSPf(Zp) can be a KN group). 

Proposition 2.4. Let G be a KN group. Then the center Z{G) is trivial. In 
particular, no nilpotent group is a KN group. 

Proof. Suppose Z{G) yf 1 and let H = G x Z{G). Consider the onto homomor- 
phism If : H ^ G given by (g, z)ip = gz~^; it’s straightforward to verify is a 
homomorphism. However, p does not factor through either projection; indeed, 
kerp = {{g,g) \ g G Z{G)} is the diagonal embedding of Z{G) and intersects 
the two factors trivially. The last statement follows since nilpotent groups have 
non-trivial centers. □ 

Kovacs and Newman essentially showed that there is a large number of KN 
groups. Clearly any KN group must be monolithic. The following proposition 
about subdirect products is straightforward [9]; we omit the proof. 

Proposition 2.5. Suppose H « Gi x G 2 . Then a subgroup N < Gi is normal 
in Gi if and only if it is normalized by H . In particular, if K <\ H and N <iGi, 
then K n N is normal in Gi . 

Let us establish some notation. If : iL — >■ G is a homomorphism and N <\G 
is a normal subgroup, then H acts on N by first applying ip and then acting by 
conjugation. Let Gh{N) denote the centralizer of N under this action. Notice 
that kerip < Gh{N), in fact, Gh{N) = Ga{N)'ip~^ . Centralizers of minimal nor- 
mal subgroups shall play an important role in this paper thanks to Theorem 2.8 
below. 

Lemma 2.6. Suppose G is a monolithic group with non-Abelian monolith M. 
Then Gg{M) is trivial. 




Join Irreducible Pseudovarieties 



283 



Proof. If Cg{M) is non-trivial, then M < Cg{M). But this implies that M is 
Abelian. □ 



Another useful fact about minimal normal subgroups is the following. 

Lemma 2.7. Let G he a group and M <] G he a minimal normal subgroup. 
Then there exists a normal subgroup N <i G such that N C\ M = 1 and G/N is 
monolithic with monolith MN/N = M. Moreover, Gg{M) = Gg{MN/N). 

Proof. Let iV be a maximal normal subgroup such that N D M = 1. Let N < 
K <\ G; then K C\ M ^ 1. Since M is minimal, we conclude M < K. It follows 
that MN/N is the unique minimal normal subgroup of G/N. Since M DN = 1, 
M ^ MN/N. 

Clearly Cg{M) < Gg{MN/N). For the converse, suppose m G M and g G 
Gg{MN/N). Then g~^mg = mn with n G N. So n = m~^{g~^mg) G M D N = 
1. We conclude g G Cg{M). □ 

The following technical theorem on lifting minimal normal subgroups and 
their centralizers is essentially due to Kovacs and Newman [6,7,8]; our proof is 
adapted from [9, Chpt. 5, §3]. 

Theorem 2.8. Let G he a monolithic group with monolith M . Suppose that one 
has a diagram as in (2.2). Let t : G ^ G/Cg{M) he the natural quotient map. 
Then Lpr factors through one of the direct product projections. 

Proof. Let K = ker<p. To each pair (Ni,N 2 ) of normal subgroups Ni < Gi, 
N2 <1 G2, we associate the positive number |Gi/iVi| + IG2/N2I; this number is 
called the weight of the pair. Suppose (fVi,A^2) is of minimal weight with the 
property that there is a factorization: 



H 



Ip 



H-p« Gi/Ni X G2/N2 







G 



(2.3) 



That is choose (Ni,A^ 2 ) of minimal weight so that for %p, as defined in (2.3), 
kerip < K. Set Di = Gi/Ni. Notice that if Di (respectively, D 2 ) is trivial, then 
(fi (and hence ipr) factors through the projection to G 2 (respectively, Gi) and we 
are done. So we assume from now on that D\, D 2 1. Let us fix the following 
notation: H' = H^p, K' = Kip. 

Fact 2.9. Every non-trivial normal subgroup of Di intersects H' non-trivially. 

Proof. Suppose, without loss of generality, N = N' /Ni (with N' Gj Ni) is a 
non-trivial normal subgroup of Di with LT D N = 1. Then H' « Di/N x D2 
under the natural map and so we have a factorization 
H H'« Gi/N' X G2/N2 

T.. 

G 
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where (N',N 2 ) has strictly smaller weight than (Ni,N 2 ) - contradiction. □ 

Fact 2.10. The normal subgroup K' of H' intersects D\ and D 2 trivially. 

Proof. Suppose, without loss of generality, that N = K' (1 Di ^ 1. By Proposi- 
tion 2.5, N is normal in D\. Thus, recalling N < K' , we obtain a factorization 

H — ^ H'/N« D^/N X D 2 

T.. 

G 

contradicting the minimality of the weight of (iVi, N 2 ). □ 

We need one last fact before completing the proof. 

Fact 2.11. Let iTi \ H ^ Di he the projection and Li = kerTTj. Suppose N is a 
minimal normal subgroup of Di. Then N < H' and T> takes N isomorphically 
onto M . Moreover, 

U < Ch{N) = Ch{M) = CG{M)g,-\ (2.4) 

Proof. By Fact 2.9, N (1 H' ^ 1. By Proposition 2.5, N (1 H' is normal in Di. 
Hence, by minimality of iV, iV < H' . Also Proposition 2.5 immediately implies 
that is a minimal normal subgroup of H\ 

Since N < Di, Fact 2.10 shows that K' CiN = 1. Thus <P : H' ^ H' /K' = G 
is injective on N . Hence NT> = TV' is a minimal normal subgroup of G. Indeed, 
if iVo is a normal subgroup of G properly contained in N' , then No<L~^ D N is 
a normal subgroup of H' properly contained in N, a contradiction. Since G is 
monolithic, it follows NT> = M. 

We are now left with proving (2.4). Without loss of generality, let us take i = 
1. Notice that there is an ambiguity in the notation Ch{N) since we can either 
view H as acting on N by first doing tp and then conjugating, or by first doing tti 
and then conjugating. We show that the centralizers under either interpretation 
are the same. Indeed, let h G H and n G N. Set hf) = {h\,h 2 ) = (/itti, /i 7T2). 
Then 

{hil})~^nhf} = {hf^ ,hf^){n, l)(/ii, / 12 ) = {hf^nhi, 1), 

so hip centralizes N if and only if hi = hiri centralizes N. 

It now follows that Li = kerTTi < Ch{N). Clearly if hip centralizes N, then 
hipd> centralizes N<L = M. Thus Ch{N) < Ch{M). Suppose h G Ch{M). Let 
hip = (/ii, / 12 ) and let n G N. Since hip<P centralizes n<L>, 

{hf^nhi, 1) = {hi, h 2 )~^{n, l)(/ii, / 12 ) = {hip)~^n{hip) = nk 

some k G K'. Hence k = n~^hf^nhi G K' (1 Di = 1, by Fact 2.10. Thus 
h G Ch{N). The equality Ch{M) = Cg{M)(p~^ is elementary. □ 

Fact 2.11 completes the proof of the theorem since ii pi \ H ^ Gi is the 
projection, then kerpi < Li < Ch{M) = kerpr, as desired. □ 
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Hence we obtain one of our principal results^. 

Theorem 2.12. Let G he a monolithic group with non-Abelian monolith M. 
Then G is a Kovdcs-Newman group. 

Proof. Suppose one has a diagram as per (2.2). Then, by Theorem 2.8, (pr factors 
through a direct product projection (retaining the notation of that theorem). 
But, by Lemma 2.6, r is the identity map; we deduce that G is a KN group. □ 

Corollary 2.13 (Kovacs-Newman [6,7,8,9]). Each distinct monolithic 
group with non-Abelian monolith generates a distinct join irreducible pseudova- 
riety. 

2.3 Applications 

We use the above results to provide some fji group pseudovarieties, including a 
weaker version of the result of Auinger and the second author [2,3] mentioned 
earlier (but with a more elementary proof) . 

Let An be the alternating group on n letters. It is well known that every 
finite group embeds in A„ for some n > 5 and that A„ is simple non-Abelian, 
n > 5, and hence a KN group by Theorem 2.12. Thus G = \/ HSPf (A„) is fji by 
Lemma 2.1. 

The authors are indebted to John Dixon for pointing out the following exam- 
ple. Set Gi = PSL(2, 2*); the Gi are simple non-Abelian groups and Gi < Gj+i, 
so Lemma 2.1 shows H = V HSPf (Gi) is fji. Since, for any q > 2, the g-subgroups 
of Gi are Abelian, H is a proper fji pseudovariety of groups. 

Corollary 2.14. Suppose H is a pseudovariety of groups such that, for each 
G G H, there is a simple non-Abelian group H such that H I G € H. Then H is 
finite join irreducible. 

Proof. We use Lemma 2.2. Let G G H and Tt he a simple non-Abelian group 
such that W = H I G G tl. Set M = <\ W; we claim that M is a minimal 

normal subgroup. Indeed, suppose 1 yf / G M; we show that the normal closure 
A of / is M. Conjugating by an element of G, we may assume 1/ yf 1. Since 
H has trivial center, there exists h G H such that h~^{lf)h yf If. Define 
g € M hy Ig = h, h'g = 1 for /i' G JL \ 1. Set k = f{g~^fg)~^', then Ifc y^ 1, 
h'k = 1 all h' G H \ 1 and k G N. Since H is simple, it now follows that 
K = {f G M \ hf =1, V/i G A \ 1} < A. But {g-^Kg \ gGG) = M. 

By Lemma 2.7, there is a normal subgroup N <\W such that: NDM = 1, G' = 
W/N is monolithic with monolith MN/N = M and Gw{MN/N) = Gw{M). 
Since G acts faithfully on M, G fl Gw{M) = 1. Thus 

G n A < G n Cw{mn/n) = g n Gw{n) = i, 

and so G < G'. Clearly (1) of Lemma 2.2 is satisfied; since M is non-Abelian, 
G' is a KN group and so (2) is also satisfied. □ 

^ After being shown a preprint of this paper, L. G. Kovacs (private communication) 
was able to prove the converse of Theorem 2.12. 
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3 The Semigroup Case 

3.1 Preliminaries 

Before defining KN semigroups, we review some preliminary notions concerning 
simple semigroups and minimal ideals. Each semigroup S has a minimal ideal 
K{S), sometimes called its kernel, which is a (completely) simple semigroup [5]. 
Moreover, if : S' — » T is an onto homomorphism, then K{S)(p = K(T) [5]. 

Notice that S acts on the left and right of K{S). Following [5, Chpt. 8], S is 
said to be generalized group mapping over its kernel if S acts faithfully on both 
the left and right of K{S). It is shown [5, Chpt. 8] that S is generalized group 
mapping over K{S) if and only if the following congruence on S is the equality 
relation: 



Si = S2 yki,k2 & K{S), /eiSifc2 = /ciS2fc2. (3.1) 

One says that S is group mapping over K{S) if either S = 1 or S is generalized 
group mapping over K{S) and K{S) contains a non-trivial group. 

Recall that a simple semigroup S is always isomorphic to a regular Rees 
matrix semigroup M{G, A, B, C), where A, B are sets, G is a group, called the 
maximal subgroup of S, and C : B x A ^ G is a, matrix, c.f. [5, Chpt. 7]. It 
follows easily from [5, Chpt. 8, Fact 2.22] that a simple semigroup S is group 
mapping if and only if G is non-trivial and no two rows of G are proportional 
on the left and no two columns of G are proportional on the right. In particular, 
any group is group mapping. 

3.2 Kovacs-Newman Semigroups 

We now define a KN semigroup: A non-trivial semigroup S' is a Kovacs-Newman 
semigroup (KN semigroup) if whenever there is a diagram 

S^T « Ti X T2 (3.2) 

Lp factors through one of the projections. As in the case of groups, it is clear 
that if S is a KN semigroup, then ElSPf(S) is ji; moreover, non-isomorphic KN 
semigroups generate distinct pseudovarieties. 

It is not clear a priori that a KN group is a KN semigroup. This will be a 
consequence of our main result stating: if S is group mapping over K{S) and 
the maximal subgroup of K{S) is a KN group, then S is a KN semigroup. In 
particular, each KN group is a KN semigroup. 

To prove this, we shall need a special case of [5, Chpt. 8, Prop. 3.28], which 
highlights the importance of group mapping semigroups by saying that homo- 
morphisms to group mapping semigroups “have kernels.” 

Proposition 3.1. Suppose that (p : T ^ S is a surjective homomorphism and 
that S is group mapping over K{S). Let H he a maximal subgroup of K(T) and 
suppose :T ^ T' is a surjective homomorphism such that ker-ipln < ^^e^:^p\H■ 
Then <p factors through if. 
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Proof. Suppose that ti,t 2 G T and tiip = we need to show that t\ip = t 2 ^- 
Since S is group mapping, to do this it suffices, by (3.1), to show that, for all 
fci,/c2 G K{S), 

ki{h(p)k2 = ki{t2v)k2- 

Since K{S) = K{T)(p, this is equivalent to showing, for all ji, j 2 G K{T), 

Uitij2)p = {jlt2j2)P- 

Set ti = j\tij 2 , i = 1,2. First observe that t\ % t 2 (by finiteness [5]) and so 
they belong to the same maximal subgroup G of K{T). By, say Green’s Lemma 
or Rees’s Theorem [5], there exist x,y,x',y' G K{T) such that u !->■ xuy is a 
bijection from G to H with inverse given by u i— x'vy' . Let K = ker(p|// and 
N = so N < K hy hypothesis. Since tiip = t 2 'ip, {xtiy)tp = {xt 2 y)'>p 

and so xtiyN = xt 2 yN. Since N < K, we have xtiyK = xt 2 yK, that is, 
{xt\y)ip = {xt 2 y)^p. Thus 

hV = {x'xtiyy')ip = {x'xt 2 yy')ip = 

completing the proof. □ 

The following theorem, along with Theorems 2.12 and 3.6, can be viewed as 
the principal results of this paper. 

Theorem 3.2. Let S be semigroup that is group mapping over K{S) and such 
that K{S) has a maximal subgroup G that is a Kovdcs-Newman group. Then S is 
a Kovdcs-Newman semigroup. In particular, every KN group is a KN semigroup. 

Proof. Suppose we have a diagram as in (3.2); let : T — >■ be the projection. 

Since K{S) = K{T)ip, standard results about simple semigroups (c.f. [5, Chpt. 
7]) show that there is a maximal subgroup H < K{T) such that Hp = G. Let 
K = heiLp\u] then G = H/K. Clearly H « Htti x Htt 2 . Setting Ni = kerTTij//, 
we have, since G is a KN group, Ni < K for some i. Proposition 3.1 then implies 
Lp factors through tt^. □ 

Corollary 3.3. Let S be a semigroup that is group mapping over K{S) and such 
that the maximal subgroup of K{S) is monolithic with non- Abelian monolith. 
Then S is a KN semigroup. 

An open question is to describe all KN semigroups^. 

3.3 Applications 

Our first application is the following theorem. 

Theorem 3.4. The pseudovariety CS of (completely) simple semigroups is fi- 
nite join irreducible. 

^ Since this paper was submitted, the authors were able to prove (using in part L. G. 
Kovacs solution to the group case) the converse of Corollary 3.3, thus resolving this 
question. 
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Proof. It is well known [5, Chpt. 8] that every simple semigroup is a subdirect 
product of a right zero semigroup, a left zero semigroup and a group mapping 
simple semigroup. A non-trivial group G can be embedded into the group map- 
ping simple semigroup S = Ai{G,2,2,C) with the structure matrix 




where 1 ^ g € G. Moveover, the two element left and right zero semigroups 
divide S, so we may conclude that CS is generated by the collection of all group 
mapping simple semigroups that are not groups. 

Let Ai{G,A,B,G) be a group mapping simple semigroup, which is not a 
group, and suppose G < A„ with n >5. Then 

M{G, A, B, G) < 7W(A„, A, B, G) 

(where we now view C as a matrix over A„) and the latter semigroup is group 
mapping. It follows that CS is generated by group mapping simple semigroups 
with structure group A„, n > 5. Each such semigroup generates a ji pseudova- 
riety by Corollary 3.3. 

We now show that given two such semigroups 

= 7W(A„, A, B, G), S2 = M{Aj,A', Bf G') 

there is such a semigroup containing them both. First observe, that we may 
assume n = j hy replacing the smaller index by the larger one. 

Now construct a matrix P as follows. Without loss of generality, we may 
assume that G has at least as many rows of G' . Then add to G' as many rows of 
I’s as needed in order to obtain a matrix G" with the same number of rows as G. 
Let P = (G C"') . No two rows of P are proportional on the left, since no two rows 
of G are proportional on the left; however P may have some columns proportional 
on the right. So we identify the proportional columns of P to obtain a new matrix 
P' . Since multiplying a column on the right by a scalar and changing the order 
of the columns does not change a Rees matrix semigroup, the resulting Rees 
matrix semigroup over A„ with structure matrix P' contains a copy of Si and 
^ 2 . 

We may conclude that CS is the directed supremum of ji pseudovarieties and 
so an application of Lemma 2.1 establishes the theorem. □ 

Recall that a semigroup S is said to be an ideal extension of R by T if R is 
an ideal of S and S/B = T. For a semigroup S, we use to denote S with an 
adjoined zero and to denote S with an adjoined identity (in both cases, even 
if S already had one). 

Lemma 3.5. Let S he a semigroup, G he a group and 1 ^ g € G. Then there is 
a semigroup S{G,g) such that: 

1. S{G,g) is an ideal extension of K{S{G,g)) hy S^; 
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2. G is the maximal subgroup of K{S{G,g)); 

3. S{G,g) is group mapping over K{S{G,g)). 

Moreover, if ip \ G ^ H is a surjective homomorphism and gcp ^ 1, then there 
is a natural surjective morphism ip : S{G,g) S{H,gip) such that ip is injective 
on S and p>\a = ip- 



Proof. We first construct a semigroup Sq{G, g) meeting the requirements 1-2. Let 
K{G,g) = M{G, A, B, Pq) be constructed as follows. We set B = . Choose 

I ^ g € G. Construct the (n -I- 1) x (n -I- 1) matrix 



P = 



(gl... 1 \ 
Ig 1 1 

: 1 ; 

Vll... gj 



The rows of P shall be indexed by and the columns by {a/^, . . . 

For each s G S, we create n -I- 1 new columns as follows: Column Ogj has entry 
in row Si G equal to the entry in row SiS, column ajj. The resulting matrix 
is denoted Pq. 

We now form So{G,g) = S' U K{G,g) where multiplication of elements of 
K{G,g) by elements of S is defined as follows: 

{asij,h,S2)s = {as^j,h,S2s),with si,S2GS^, s G S, hGH 
ft,, S 2 ) = (ossjj, /i, S 2 ), with si,S 2 GS^, s G S, hGH 



It is an exercise in the linked equations [5, Chpt. 7] to verify So(G, g) is a 
semigroup with K{So{G,g)) = K{G,g). Define a congruence = on So{G,g) by 
ti = t 2 if and only if = ^ 1 ^ 2 ^ for all k\,k 2 G K{G,g). Then S{G,g) = 

So{G,g)/= satisfies 2 and 3; see [5]. We need to show that 1 is satisfied. To prove 
this, it suffices to show that = does not identify elements of S. Suppose s,s' G S 
are distinct. By construction of Pq, there is a column ajj such that the entry in 
row s is g and the entry in row s' is 1. Then 



(a/,1, l,/)s(a/j, 1,/) = (a/,1, l,s)(a/j,l, 7 ) = (a/,1,5, 1) 
(^/,lj 1; ,j At ,1 fo ) ^ ) (^/,J ; 1 5 A 1,1 T A 



and so s ^ s', as desired. 

Suppose now that (/? : G — » 77 is a surjective homomorphism and gif yf 1. 
The map <7 : S'o(G, 5 ) ^ Sq{H, g(p) defined by s<p = s for s G S' and (a, 5 ', b)<P = 
(a, g'ip, b) for (a, 5 ', b) G 77(G, 5 ) is clearly a surjective homomorphism and <P\g = 
if. Since K{So{G,g)) is the minimal fft-class mapping onto K {So{H , g(p)) , the 
results of [5, Chpt. 8 ] immediately imply that there is an onto homomorphism 
if : S{G,g) S{H,g(f) such that 
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So{G,g) So{H,gip) 



S{G,g) -t- S{H,g^) 



(3.3) 



commutes. From the commutativity of (3.3) and (1) for S{H,gip), it immediately 
follows that ip is injective on S and ip\G = if- □ 



Let H be a group pseudovariety; denote by H the pseudovariety of semigroups 
with subgroups in H. If V is a pseudovariety, set V (H) = VnH. Let CR be the 
pseudovariety of completely regular semigroups and let DS be the pseudovariety 
of semigroups whose regular j7-classes are subsemigroups. The main application 
of this paper is the next theorem and its corollaries. 

Theorem 3.6. Suppose that H is a pseudovariety of groups containing a non- 
nilpotent group. Let Y he a pseudovariety of semigroups containing a non-trivial 
semilattice and closed under ideal extensions of elements o/CS(H) by elements 
ofV. ThenY is finite join irreducible. 



Proof. First note that V is closed under the operation of adjoining a (new) zero 
since it contains a non-trivial semilattice [4] . 

We claim that H contains a monolithic group G with non-central monolith 
M. Indeed, let G G H be a non-nilpotent group of minimal order. Clearly G 
must be subdirectly indecomposable; let M be its monolith. By choice of G, 
G/M is nilpotent. If M were central, then G would be a central extension by a 
nilpotent group and hence nilpotent, contradicting the choice of G. 

Let S' G V and let g ^ Cg{M). Set S' = S{G,g). By hypothesis. S' G V; 
clearly (1) of Lemma 2.2 holds; we show that (2) holds. 

Suppose S' G Vi V V 2 . Then there is a diagram S'<^-T << Ti x T 2 with 
Ti G Vi, i = 1,2. As in the proof of Theorem 3.2, K{T)ip = K(S') and there 
is a maximal subgroup H of K{T) with Hip = G. Also H « Htti x Htt 2 , 
where Xi : T ^ Ti is the projection, i = 1,2. Set Ni = kerTTij//, i = 1,2. By 
Theorem 2.8, Ni < Ch{M), some i, say i = 1. 

Let p : G ^ G/Cg{M) be the projection and consider the map p : S{G,g) 
S{G/CG{M),gp) as per Lemma 3.5; note that gp yf 1. Then A^i < Ch{M) = 
\<ier{ip'p)\H, so, by Proposition 3.1, the quotient ipp : T S{G/CG{M),gp) 
factors through tti. Since S < S{G/Ca{M),gp), it follows S divides Ti and so 
S G Vi. This completes the proof that (2) of Lemma 2.2 holds, establishing the 
theorem. □ 



Corollary 3.7. The pseudovarieties S, CR and DS are all finite join irre- 
ducible. More generally, suppose H is a pseudovariety of groups containing a 
non-nilpotent group. Then H, CR(H) and DS(H) are all finite join irreducible. 
Hence none of these pseudovarieties has a maximal proper subpseudovariety. 
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Since the only nilpotent pseudovarieties of groups closed under semidirect 
product are 1 and Gp (p-groups), we have recovered all of the join results of [10] 
with the exceptions of A = 1 (aperiodic semigroups) and Gp. In fact, our results 
are stronger since we prove fji rather than sfji. 
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Abstract. In a preceding paper (Bruyere and Carton, automata on lin- 
ear orderings, MFCS’Ol), automata have been introduced for words in- 
dexed by linear orderings. These automata are a generalization of au- 
tomata for finite, infinite, bi-infinite and even transfinite words studied 
by Biichi. Kleene’s theorem has been generalized to these words. We show 
that deterministic automata do not have the same expressive power. De- 
spite this negative result, we prove that rational sets of words of finite 
ranks are closed under complementation. 



1 Introduction 

Automata were first introduced by Kleene who showed that they have the same 
expressive power as rational expressions [13]. Since then, many extensions of this 
deep result have been proved. Different kinds of structures have been considered 
like infinite words [7,14], bi-infinite words [11,15] and transfinite words [9,10,22], 
finite and infinite trees [18], finite and infinite traces, pictures, etc. 

In [2,3], have been introduced automata that accept linearly-ordered stuc- 
tures. These automata are a simple and natural generalization of usual automata 
with additional limit transitions of the form P ^ q and q ^ P where P is sub- 
set of states. They allow to treat in the same framework finite, infinite words, 
bi-infinite words and transfinite words. These automata were proved to be equiv- 
alent to some rational expressions when the orderings are restricted to scattered 
orderings. Recall that scattered orderings are those orderings which do not con- 
tain a dense sub-orderering like Q. They include the ordinals and their mirrors. 

One main property of rational sets is the closure under complementation. 
It means that for any automaton A, there is another automaton B accepting 
exactly the structures that are not accepted by A. This property holds for almost 
all structures: finite and infinite words, finite and infinite trees and even for 
transfinite words on ordinals. 

This property is important both from the pratical and the theoretical point of 
view. It means that the class of rational sets forms an effective boolean algebra. 
It is used whenever some logic is translated into automata. For instance, in both 
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proofs of the decidability of the monadic second-order theory of the integers by 
Biichi [8] and the decidability of the monadic second-order theory of the infinite 
binary tree by Rabin [18], the closure under complementation of automata is the 
key property. It is well known that automata have the same expressive power 
as the monadic second order theory on many structures like finite, infinite and 
transfinite words and trees. A nice result would be to extend this equivalence to 
linear orderings. Proving the closure under complementation is one step towards 
this result. 

In [3], the closure under complementation was left open. In this paper, we 
address this problem and we solve it for a subclass of scattered linear order- 
ings. Namely, we prove that rational sets of words on scattered orderings of 
finite ranks are closed under complementation. Recall that Hausdorff’s result 
[12] states that scattered orderings can be obtained from the finite orderings by 
repetitive applications of w-sums and — w-sums (see Theorem 1). The rank of a 
scattered linear ordering is the number of nested w-sums and — w-sums needed 
to obtain it. The ranks of all countable scattered linear orderings range over all 
countable ordinals. It can be seen as a measure of its complexity. For instance, 
oj and C are scattered orderings of rank 1. Our result generalizes both the com- 
plementation of infinite and bi-infinite words. The class of scattered orderings 
of finite rank includes ordinals smaller than Therefore, our result holds for 
sets of transfinite words studied by Choueka [10]. 

The classical method to get an automaton for the complement of a set of 
finite words accepted by an automaton A is through determinization [1] . Another 
method uses algebraic objects like semigroups [17]. The determinization method 
can still be used for infinite words but it becomes more involved [21,4]. This 
method has been pushed further by Biichi for countable transfinite words but it 
is then very complex [9] . The algebraic method can also be extended to ordinals 
[5,6] . In our case, this method can not be applied since automata can not be made 
deterministic. In this paper, we give an example of a rational set of words that 
cannot be accepted by a derterministic automaton. Therefore, we use another 
method which was introduced by Biichi for infinite words. It is based on an 
equivalence relation on words whose classes are shown to be rational. 

The paper is organized as follows. In Section 2, we introduce words indexed by 
linear orderings and recall the Hausdorff characterization of countable scattered 
linear orderings. Then rational sets of words are defined from rational operators 
and automata in section 3. We finally prove in section 4 that rational sets of 
words indexed by countable scattered linear orderings of finite ranks are closed 
under complementation. 



2 Words on Linear Orderings 

In this section, we recall some definitions and operations on linear orderings but 
we refer the reader to [20] for a complete introduction to linear orderings. We 
give the Hausdorff’s characaterization of countable scattered linear orderings 
and introduce words indexed by linear orderings. 
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Let J be a set equipped with an order <. The ordering J is linear if for any 
j and k in J, either j < k or k < j. A linear ordering J is dense if for any j and 
fc in J such that j < k, there exists an element i of J such that j < i < fc. It is 
scattered if it contains no dense subordering. The ordering oj of natural integers 
and the ordering Q of relative integers are scattered. More generally, ordinals are 
scattered orderings. 

Let A be a finite alphabet. A word x = (aj)j^j indexed by a linear ordering 
J is a function from J to A. J is called the length of x. For instance lo is the 
length of right-infinite words apai... and Q is the length of bi-infinite words 

In order to define the rank of scattered linear orderings, we recall operators. 



2.1 Operations on Linear Orderings 

For any linear ordering J, we denote by — J the backward linear ordering that 
is the set J equipped with the reverse ordering. For instance, —to is the linear 
ordering of negative integers. 

The sum J + K oi two linear orderings is the set J U AT equipped with the 
ordering < extending the orderings of J and K by setting j < k for any j £ J and 
k £ K. For instance, ^ = —u + lj. Formally, the sum is the set of all pairs 

jeJ 

(k,j) such that k £ Kj equipped with the ordering defined by (fci, ji) < (^2,12) 
if and only if ji < j'2 or (ji = j'2 and ki < ^2 in Aji). 

The sum of linear orderings helps to define the lengths of the products of 
words. Let J be a linear ordering and let (xj)j(zj be words of respective length 
Kj for any j £ J. The word x = Y[ obtained by concatenation of the words 

KJ 

Xj with respect to the ordering on J is of length L = ^ Kj. We call J -product a 

KJ 

product indexed by the ordering J. For instance, the w-product of the word a“ is 
the word of length ^ u. The sequence (xj)j^j of words is a J -factorization 

UJ 

of the word x = Y\ Xj. 

jeJ 



2.2 Construction of Countable Scattered Linear Orderings 

Countable scattered linear orderings are defined through a forbidden pattern, 
namely that they do not contain a dense subordering. Hausdorff’s theorem states 
that they can be constructed from finite orderings. 

We denote by Af the subclass of finite linear orderings, O the class of countable 
ordinals and S the class of countable scattered linear orderings. 

Theorem 1. [ 12 ] A countable linear ordering J is scattered if and only if J 

belongs to [J Va where the classes Va are inductively defined by: 
aeo 



1 . Co = {0,1} 




Complementation of Rational Sets on Scattered Linear Orderings 295 



2. Vo,= J -lo, C} and Kj £ \J Vfs 

0<OL 

where 0 and 1 are respectively the orderings of zero and one element. 

Intuitively, the rank of a linear ordering is the maximum number of nested 
CO and —oo. It is linked to its Hausdorff’s class. For instance the orderings co of 
rank 1 and of rank 2 belong respectively to Vi and V 2 . Nevertheless, the class 
Va is not exactly the set of orderings of rank a. For instance, the ordering co + co 
is of rank 1 and belongs to V 2 . Therefore, we work on slightly different inductive 
classes. For any a G O, we define the class Wa by : 



VFa = 




J G Af and Kj G Vk 



Those classes are strictly intermediate to the Hausdorff’s ones: the inclusions 
Va C Wa C V"a+i hold for any ordinal a. For instance, the ordering 
belongs to Mfo but does not belong to Va and the ordering belongs to 

Va+i but does not belong to Wa- Formally, the rank of a linear ordering J is 
the smallest ordinal a such that J G Wa- For instance the orderings of rank 0 
are the finite ones. In this paper, we restrict to linear orderings of finite ranks 
that is the set IJ Wn = U ■ 

n<u; n<.uj 

By extension, the rank of a word is the rank of its length and the rank of a 
set of words is the upper bound of the ranks of its elements. 

We denote by the set of all words indexed by countable scattered linear 
orderings and we also denote by A^’' (respectively the set of words whose 
length is an ordering in Wr (respectively W) for some integer r. Thus the words 
of A^'' have a rank lower than or equal to r. 



3 Rational Sets of Words on Linear Orderings 

Bruyere and Carton [2] have introduced rational expressions and automata for 
words indexed by countable scattered linear orderings. They have proved that 
a set of words is rational if and only if it is recognizable extending Kleene’s 
theorem. More precisely, they have defined a whole hierarchy of rational sets [3]. 
For each subset of rational operations, they consider the class of corresponding 
rational languages and define transitions of automata capturing the same lan- 
guages. In the following section, the characterization of rational sets of words of 
finite rank is notified. 



3.1 Rational Expressions 

The rational sets of finite rank can be obtained from finite sets of finite words 
using the union -I-, the concatenation •, the star *, the omega iteration co and the 
backwards omega iteration —co. Let X and Y be two sets of words, we define: 
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X + Y = {z\ z e XUY} 

X -Y = {x-y\ X G X,y GY} 

n 

X* = { n Xj\ n G Af,Xj G X} 
i=i 

X‘^ = { n Xj\ Xj G X} 

X~‘^ = { n Xj\ Xj G X} 

je-cj 

To define rational sets of words indexed by all linear orderings, three more oper- 
ations are needed : the ordinal iteration the backwards ordinal iteration — ^ 
and the iteration for all linear countable scattered orderings o. 

X* = {n xA J GO,Xj GX} 
jeJ 

X~* = { n Xj\J G 0,Xj G X} 

X oY = { n Zj\ J G S \ ^, Zj G X if j G J and Zj G Y if j G J*} 
jeJuJ* 

In this paper, we are only interested in languages which are defined using -I-, •, 
Lo and —uj. We refer the reader to [2] for a precise definition of other rational 
operations. A set of words on linear orderings is rational if it is obtained from 
finite sets of finite words using the rational operations defined above. 

3.2 Automata on Linear Orderings 

Let {Q, A, E, /, F) be a classical automaton on finite words with usual notations. 
As the set E of transitions is a subset of QxAxQ, the paths of such an automaton 
are finite. In Biichi automata, a word is accepted if it is the label of a path going 
infinitely times through a given set of states. The problem is that this accepting 
condition does not even allow to recognize the concatenation of infinite words. 
To cope with this difficulty, a set of limit transitions included in ViQ) x Q is 
introduced. This way, if an infinite path goes infinitely many times through the 
states of a set P and that the transition (P, q) exists, then the next state of the 
path may be q. 

Example 1. : Let A = {Q, A, E, I, F) be the automaton of Figure 1 where Q = 
{1,2,3}, A = {a, 6}, / = {!}, P = {3}. 

a h 

( 3 )— ► {2} ^ 3 

Fig. 1. Automaton recognizing a*&“ 

A limit transition {2} — >■ 3 is added to E. Intuitively, an infinite path going 
through the state 2 infinitely many times leads to state 3 and a path in A 
leading from state 2 to state 3 is labelled IE . Finally, this automaton recognizes 
the language a*E . 
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The previous limit transitions called left limit transitions allow to recognize sets 
of words indexed by countable ordinals. In order to get words indexed by linear 
scattered orderings, we also need right limit transitions. 

Definition 1. An automaton A on linear orderings is defined by A = 
{Q, A, E, I , F) where Q is a finite set of states, A is a finite alphabet, E C 
{Q X Ax Q)L) (fP{Q) X Q)L) {Q X V{Q)) is the set of transitions and I Q Q and 
E C Q are respectively the sets of initial and final states. 

Right limit transitions are used symetrically when a path has a limit length on 
the left. In order to use nested limit transitions, it is needed to define the left 
(respectively right) limit sets of states in a given point of the path. 

Consider a finite path labelled x = a\ . . . a„. Note that a 

state is inserted between any two consecutive letters of x. In other words, to 
any two-factorization x = (oi . . . ak){ak+i ■ . . an) of x is associated a state qk. 
This definition of paths is generalized to automata on linear orderings in the 
following way: Let a; be a word indexed by a linear scattered ordering J. To any 
two-factorization x = yz of x, one can associate a partition of J into two intervals 
{K, L) such that \y\ = K and \z\ = L. Then, a path labelled a; is a function from 
the set J = {{K, L)\K A L = J AWk G K,'il € L,k < 1 } into the set of states. 
As the set J is naturally equipped with the ordering (ATi, Li) < {K2, L2) if and 
only if Ki C K2, a path labelled by a word of length J is a word over Q of length 
J. An element of J is called a cut. 

Let 7 = {qc)^^j be a word of length J over Q, we are now able to define the 
limit sets of states of 7 in a given cut c of J: 

lim7 = {<7 G (51 Vc < c, 3c < c < c such that q = qp/} 

C~ 

lim7 = {g G (51 Vc > c, 3c < c < c such that q = qp>} 

C+ 

For instance, in example 1, the word 7 = {qc)ce(Z defined by = 1, 

9 ({o,i,... ,n},{n+i,... }) = 2 for any positive integer n and = 3 has the following 
nonempty limit lim 7 = {2}. 

Finally, a path has to be compatible with the automata transitions: 

Definition 2. Let A = {Q, A, E, I, E) be an automaton on linear orderings and 
let X = (aj)j^j be a word of length J on A. 

A path 7 of label x in A is a word 7 = {qc)^^j of length J over Q such that for 
any {K, L) G J: 

— If there exists I G L such that {K U {?}, L \ {Z}) G J 

then q(K,L)-^q(Ko{i},L\{i}) G E else q(K,L) lini_ 7 G A. 

{K,L) 

— If there exists k G K such that {K \ {fc}, L U {A:}) G J 

then q(K\{k},Lo{k})'^q(K,L) G E else lini 7 -)> q(K,L) G E. 

{K,L) + 
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Thus, if a cut has a predecessor or a successor, usual transitions are used, else 
the path is built on limit transitions. As J has the least element (0, J) and the 
greatest element ( J, 0) for any linear ordering J, a path has always a first and 
a last state. It is said to be successful if it leads from an initial state to a final 
state. A word is recognized by an automata if it is the label of a successful path. 
We denote by p=^q the existence of a path leading from state p to g of label 

X. The content of a path is the set of states occuring in the path and p=^q 

p 

denotes a path leading from p to g of label x and of content P. 



Fig. 2. Automaton on 




linear orderings recognizing 



0-{l} 

{ 0 , 1 } ^ 2 



3.3 Generalisations of Kleene’s Theorem 

Bruyere and Carton have generalized Kleene’s theorem on words indexed by 
countable scattered linear orderings: 

Theorem 2. [2] A set of words indexed by countable scattered linear orderings 
is rational if and only if it is recognizable. 

Moreover, they have defined a subclass of automata on linear orderings which 
recognizes rational languages of finite ranks. 

Theorem 3. [3] A set of words of finite rank is rational if and only if it is 
recognized by an automata on linear orderings where limit transitions P ^ q or 
q ^ P verify q ^ P. 

4 Complement of a Rational Set of Finite Rank 

In the case of finite words, it is known that rational sets are closed under com- 
plementation. Given an automaton on finite words recognizing a language L, 
the construction of an automaton recognizing the complement A* \ L is based 
on the property that any finite automaton on finite words can be determinized. 
Biichi has generalized this result for sets of words indexed by countable ordinals 
of finite ranks [9]. This property does not hold any longer for automata on linear 
orderings. An automaton on linear orderings A = {Q, A, E, I , F) is determinis- 
tic if for any state q G Q and any word u G A^, there exists at most one path 
labelled u starting from q. 

Proposition 1. The language can not be recognized by a derterministic 

automaton. 
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To cope with this difficulty of determinism, we use a different method based on 
equivalence classes to prove the closure of rational sets under complementation. 
Up to now, we are only able to prove this result in the case of rational sets of 
words of finite ranks. 

Theorem 4. Let L he a rational set of words on linear orderings and let r be a 
finite integer. The complement A^'' \L is rational. 

In the case of finite words, Biichi has given a different proof of the closure under 
complement of rational sets. It does not need the property of determinizabil- 
ity but it is based on the following equivalence relation defined for any finite 
automaton A = {Q, A, E, I, F) on finite words: 

u ^ V if and only if Vp G QAq & Q, P Q P <7 

Note that if a word u is the label of a successful path in A, it holds for any 
equivalent word. So any equivalence class is either contained in the language L 
recognized by A or disjoint from L. Moreover, equivalence classes are rational 
thus the complement of L is rational as a finite union of equivalence classes. 
We extend this proof to automata on linear orderings of finite ranks. Let A 
= {Q, A, E, I, F) be an automaton on linear orderings recognizing L. Recall 
that a path from p to q with label u and content P is denoted by p q. As 

the contents of paths are needed in limit transitions, we define the equivalence 
relation ~ by: 

M ~ if and only if Vp G Q, Vg G Q, VP Q Q, p q p q 

Note first that the equivalence relation has finitely many classes. Indeed the class 
of a word u depends on whether there is a path from p to q with content P for 
each triple (p, q, P). Since there are n^2" such triples, the relation ~ has at most 
2” ^ equivalence classes. We denote by C the set of all equivalence classes of ~. 
For each integer r, we denote by Cr = {C (1 A^''\C G C} the set of equivalence 
classes of rank r. The cardinality of Cr is at most the cardinality of C. As in 
the case of finite words, each class C is either contained in L or disjoint from L. 
Therefore we have both equalities 

L= IJ C'andL = A^\L= |J C 
cec.cnL^D cec,cnL=<6 

The same holds for words of rank less than r. 

LDA^" = U C and \ L = U C. 

csCr.cnL^ti C€Cr,cnL=H) 

For each integer r, the family Cr contains finitely many classes. To prove that 
j^Wr \ X is rational, it suffices to prove that each C G Cr is rational. We prove 
that claim by induction on r. The result holds obviously for r = 0 and the 
induction step is based on the following idea. Suppose that Cr contains the 
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classes {C\, We define rational expressions using the Ci as letters. An 

elementary expression is an expression of the form Ci, Cf or where Ci is a 
class of Cr- We denote by B the set of elementary expressions. We consider the 
set B* of all expressions obtained by concatenation of elementary expressions. 
Suppose for instance that Cr = {Ci,C 2 }- The set of elementary expressions 
is B = {Cl, C 2 , (7^“} and a typical example of element of B* is 

C^CiC^‘^CiC^“. We consider each element of B* as a rational expression over 
the letters Cj. Each expression of B* denotes a set of words of rank at most r + 1. 
By a slight abuse of language, we say that a word belongs to an expression R in 
B* if it actually belongs to the set denoted by R. The two following lemmas are 
needed in the proof of proposition 2. Their proofs are not detailed in this paper 
because of the lack of space. In Lemma 1, we first prove that each word of rank 
at most r + 1 belongs to at least one expression in B* . 

Lemma 1. = IJ i?. 

In Lemma 2, we prove that two words belonging to the same expression are ~- 
equivalent. This means that each set denoted by an expression of B* is included 
in a single ~-class. 

Lemma 2. If two words x,y of rank at most r+1 belong to the same expression 
R of B* , then they satisfy x ^ y. 

It follows from Lemmas 1 and 2 that each class C in Cr+i satisfies 

c= u « 

ReB"-,cnRi^0 

However, this is not a rational expression since there are infinitely many such 
expressions R included in C. In the following proposition, we show that the set 
of rational expressions included in some class C can be described by a rational 
expression over the elementary expressions. 

Proposition 2. Each equivalence class in Cr is rational. 

The proof by induction on the rank r is not detailed in this paper. We come 
back to the proof of Theorem 4. 

Proof. Let A be an automaton on linear orderings recognizing L and let r be a 
finite rank. Let Cr be the set of equivalence classes of rank r according to A. From 
proposition 2, we have that each class of Cr is rational. Moreover, considering 
the definition of we note that if a word u is the label of a successful path in 
A, it holds for any equivalent word. So an equivalence class is either contained in 
L or disjoint of L. We deduce a rational expression of A^’' \L as a finite union 
of classes of Cr- 

A^-\L= U C 



As a conclusion, we mention a question that is left open by this paper. A gen- 
eralization of our result is that the class of rational sets of countable scattered 
linear orderings is closed under complementation. 
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Abstract. We consider the length L of the longest common subsequence of two 
randomly uniformly and independently chosen n character words over a fc-ary 
alphabet. Subadditivity arguments yield that E [L] /n converges to a constant 7 fc. 

We prove a conjecture of Sankoff and Mainville from the early 80’s claiming that 
')k'/k — >• 2 as fc — >• 00 . 

1 Introduction 

Consider two sequences of length n, with letters from a size k alphabet E, say ^ and i'. 
The longest common subsequence (LCS) problem is that of finding the largest value L 
for which there are 1 < ii < < . . . < < n and I < ji < j 2 < ■ ■ ■ < Jl E n 

such that /Xij = , for all f = 1,2, ... ,L. 

The LCS problem has emerged more or less independently in several remarkably 
disparate areas, including the comparison of versions of computer programs, crypto- 
graphic snooping, and molecular biology. The biological motivation of the problem is 
that long molecules such as proteins and nucleic acids like DNA can be schematically 
represented as sequences from a finite alphabet. Taking an evolutionary point of view, 
it is natural to compare two DNA sequences by finding their closest common ancestors. 
If one assumes that these molecules evolve only through the process of inserting new 
symbols in the representing strings, then ancestors are substrings of the string that rep- 
resent the molecule. Thus, the length of the longest common subsequence of two strings 
is a reasonable measure of how close both strings are. In the mid I970’s, Chvatal and 
Sankoff [ 6 ] proved that the expected length of the LCS of two random k-ary sequences 
of length n when normalized by n converges to a constant. The value of this constant 
7 fc is unknown although much effort has been spent in finding good upper an lower 
bounds for it (see, for example, [3] and references therein). The best known upper and 
lower bounds for 7 ^, do not have a closed form. There are obtained either as numeric 
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approximation to the solutions of a nonlinear equation or as a numeric evaluation of 
some series expansion (see [7] for a survey of such results). 

Although the problem of determining 7 ^ has a simple statement, it has turned out to 
he a challenging mathematical endeavor. Moreover, its quite naturally motivated. Indeed, 
a claim that two DNA sequences of length n are somewhat related makes sense provided 
their LCS differs signihcantly from 7471 (since DNA sequences have 4 basis elements). 
We analyze the behavior of 7 ^ for k tending to infinity, and more generally, we consider 
the expected length of the LCS when k is an (arbitrarily slowly growing) function of n 
and n 00 . We conhrm a conjecture of Sankoff and Mainville from the early 80’s [20] 
stating that 

lim 7 fc'\/fc = 2 . ( 1 ) 

k—¥oo 

(See [19, § 6 . 8 ] for a discussion of a stronger version, due to Arratia and Steele, of the 
above stated conjecture.) 

The constant 2 in (1) arises from a connection with the famous longest increasing 
sequence (LIS) problem. An increasing subsequence of length L of a permutation tt of 
{ 1 , . . . ,n} is a sequence 1 < < Z 2 < . . . < < n such that 7 r(zi) < 7 r(z 2 ) < 

. . . < 7r(zL)- A LIS is an increasing subsequence of maximum length. The LIS problem 
concerns the determination of the asymptotic, on n, behavior of the length of a LIS of a 
randomly and uniformly chosen permutation tt. The LIS problem is also referred to as 
“Ulam’s problem.” (e.g., in [14,4,18]). Ulam is often credited for raising it in [23] where 
he mentions (without reference) a “well-known theorem” asserting that given + 1 
integers in any order, it is always possible to find among them a monotone subsequence 
of n + 1. The theorem is due to Erd(^s and Szekeres [ 8 ]. The discussion in [23] solely 
concerns monotonic subsequences of a randomly and uniformly chosen permutation of 

+ 1 elements. Monte Carlo simulations are reported in [2], where it is observed that 
over the range n < 100, the limit of the length of the LIS of + 1 randomly chosen 
elements, when normalized by n, approaches 2. Hammersley [11] gave a rigorous proof 
of the existence of the limit and conjectured it was equal to 2. Later, Logan and Shepp [17] 
based on a result by Schensted [21] proved that 7 > 2; hnally, Vershik and Kerov [24] 
showed that 7 < 2. In a major recent breakthrough due to Baik, Deift, Johansson [4] 
the asymptotic distribution of the longest increasing sequence random variable has been 
determined. For a detailed account of these results, history and related work see the 
surveys of Aldous and Diaconis [1] and Stanley [22]. 



2 Statement of Results 

Henceforth we denote by A and B two disjoint totally ordered sets. We assume that 
the elements of A and B are numbered 1, 2, . . . , |A| and 1, 2, . . . , |i?| respectively. We 
denote by r and s the size of |A| and \B\, respectively. Typically, we have r = s = n. 

Throughout the paper we follow standard graph theory notation (for a reference the 
reader might consult Bollobas [5]). We let G denote a bipartite graph with color classes 
A and B. Two distinct edges ab and a'b' of G are said to be noncrossing if a and a' are 
in the same order as b and 5'; in other words, if a < a' and b <b' ox a' < a and b' < b. 
A matching of G is called planar if every distinct pair of its edges is noncrossing. We 
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let L{G) denote the number of edges of a maximum size planar matching in G (note 
that L{G) depends on the graph G and on the ordering of its color classes). We want to 
understand L{G)’s behavior for random choices of G. Primarily choices according to 
the following two models of random graphs: 

- The random words model k): the distribution over the set of subgraphs of 

Kn,n obtained by uniformly and independently assigning each node of „ one 
of k characters and keeping those edges whose end-points are associated to equal 
characters. 

- The binomial random graph model G{Kn^n',p)' the distribution over the set of 
subgraphs of where each edge of is included with probability p, and 
these events are mutually independent. (This is an obvious modification of the usual 
G{n, p) model for bipartite graphs with ordered color classes.) 

For a bipartite graph G over color classes A and B, let 

L{G) = max{L : 3ai < . . . < ol, 6i < . . . < o,ibi G E{G), 1 < i < L} , 

Observe that L{G), when G is chosen according to k), is precisely the length 

of the LCS of the two words, one for each of the color classes of G, corresponding to 
the characters associated to Kn^nS nodes. Also note that the latter words are uniformly 
and independently distributed length n sequences of characters over a k size alphabet. 
In other words, the study of L{S{Kn^n, k)) is just a re-wording of a similar study of the 
LCS of two randomly chosen n length sequences over a size k alphabet. Nevertheless, 
it will be more convenient to cast our discussion in the language of graph theory. 

We now argue that L{G) is “subadditive” and from it draw an important conclusion 
about its expected asymptotic behavior. Indeed, consider two bipartite graphs G and 
G' over disjoint color classes A-B and A'-B', respectively. Denote by G © G" the 
bipartite graph over color classes A U A'-B U B' . In order for L{G © G') to be well 
defined we adopt the convention that the elements of G'’s color classes are strictly larger 
than those of G’s color classes. It follows immediately that L(-) is subadditive, i.e., 
L{G © G') > L(G) + L(G') . Thus, for G and G' chosen according to k) and 

E{Km,m] k) respectively 

E [L(G © G')] > E [L{G)\ + E [L{G')\ . 

A standard subadditivity argument implies existence of lim„_>oo E [L(L’(AT„_„; k))/n]. 
The same claim holds for the binomial random graph model. 

Ourmainresultessentially saysthatL(A'(iT„ A:))--\/fc/n.convergesto2 as fc — >■ oo, 
provided that n is sufficiently large in terms of k. Specifically, 

Theorem 1. For every e > 0 there exist ko and G such that for all k > kg and all n 
with njs/k > G we have 

O77 ^T 1 

< E[L(r(iT„,„;A:))] <{l + e).—. 

Moreover, there is an exponentially small tail bound; namely, for every £ > 0 there exists 
c > 0 such that for k and n as above, 

P 



L{S{Kn,n; k)) - 



2n 

s/k 



> s 



2n 

\/k_ 



< e 



— cnj y/k 
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Corollary 1. The limit ■jk = lim E[L{S{Kn,n'-, k)) /n] exists, and lim = 2 . 

n— >oo ’ k—yoo 

The focus on the case where fc — oo is partly inspired by [15]. There, it is shown 
that L{G)/s/dn — >■ 2 in probability as n — oo provided d = and G is a 

uniformly chosen d-regular subgraph of Under the d = o(n^/^) condition, any 

node of the d-regular bipartite graph can potentially be matched to a d/n — > 0 fraction 
of the other color class nodes. In the case of interest here, that is the LCS problem with 
k oo, it also happens that any sequences’ character can be matched to an expected 
1/fc — 0 fraction of the other sequence’s characters. Both for this work and in [15], the 
vanishing fraction of (expected) potential matches is a key issue. Indeed, this is where 
the connection with the LIS problem arises. To clarify this point, suppose G is chosen 
according to k) and assume n k. Then, an easy calculation shows that the 

expected number of edges of G is n^/fc and that the average degree of a node is n/fc <C 1. 
For /k ^ 1, it turns out that disregarding degree 0 nodes, G is essentially a perfect 
matching on approximately ri^ /k nodes. In other words, essentially a permutation tt on 
approximately n^/fc elements! Moreover, a LIS of tt is a planar matching in the original 
graph G. It turns out that the length of a LIS of tt is in fact very close to L{G). 

Here is an outline of the paper. First, we state in Section 3, the estimate for the length 
of a LIS of a uniformly chosen permutation on which we rely. Then in Section 4, we 
formalize the claim of the previous paragraph. In Sections 5 and 6, we handle the case 
were n is not small in comparison with k. In these sections we re-establish, respectively, 
the lower and upper bounds derived in Section 4, but lifting the constraint that n be 
“small”. This completes the proof of our main result. The gist of the paper is the material 
of Section 6 showing how the upper bound for “small” values of n are used to obtain 
upper bounds without this latter restriction on n. In order to simplify the exposition, we 
focus mainly on the random words model. Nevertheless, in Section 7 we discuss other 
models, among them the binomial random graph model. Thus we hope to convey, to 
some extent, that our proof arguments can be successfully adapted to a wider class of 
probabilistic distributions over bipartite graphs. Due to space considerations, basically 
all proofs are omitted from this extended abstract. Full proofs can be found in [16]. 



3 Tools 



The crucial ingredient in our proofs is a sufficiently precise result on the distribution 
of the length of the longest increasing subsequence in a random permutation. We state 
a remarkable strong result of Baik, Deift and Johansson [4, eqn. (1.7) and (1.8)] (our 
formulation slightly weaker than theirs, in order to make the statement simpler). A much 
weaker tail bound than provided by them would actually suffice for our proof (e.g.. 
Frieze’s [9] LIS concentration result). 

Theorem 2. Let LIS at be the length of the longest increasing subsequence of a randomly 
chosen permutation o/{ 1, . . . , N}. There are positive constants Bq, B\, and c such that 
for every A with < A < s/N — 2, 



P 



LISa, > 2Vn+xVn 



< Bi exp 
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and for every A with < A < 2, 

P [lISat < 2\fN - A\/ivj < Bi exp {-cX^N) . 

4 Small Graphs 

In this section we derive a result essentially saying that Theorem 1 holds if k is sufficiently 
large in terms of n. For technical reasons, we also need to consider bipartite graphs with 
color classes of unequal sizes. 

Proposition 1. For every i5 > 0, there exists a (large) positive constant C such that: 

(i) If rs > Ck and (r + s)\/r^ < then with m„ = m„(r, s) = 2(1 + 

(5) \prTJk, for all t > 0, 

P[L{B{Kry, k)) > m„ + f] < . 

(ii) Ifrs > Ck and and r + s < Sk/Q, then with rriu as above and mi = mi{r, s) = 
2(1 — 5) \frsjk, for all t > 0, 

P[L{B{Kr,s; k)) <mi-t]< . 

The idea behind the proof of Proposition 1 is simple: we show that (ignoring degree 0 
nodes) for G chosen according to S{Kr^s] k) is “almost” a matching, but the size of the 
largest planar matching in a random matching corresponds precisely to the length of a 
LIS in a randomly chosen permutation. 

First, we deal with the (usually few) nodes of G of degree larger than one. To this 
end, we define a graph G" obtained from G by removing all edges incident to nodes of 
degree at least 2. Throughout, E and E' denote E{G) and E{G'), respectively. 

Ignoring degree 0 nodes, G' is clearly a matching on its end-points — equivalently 
its a permutation of {1, . . . , \E'\}. Theorem 2 thus gives us an estimation of L{G') in 
terms of \E'\ = \E\ — |i? \ E'\. But, L{G') < L{G) < L{G') + \ E'\. Hence, good 

estimates on |i?| and \ E\E'\ coupled with the aforementioned estimate of L(G') yields 
the sought after bounds on L{G). 

It follows easily that E [|E|] = rs/k. A simple second-moment argument (Cheby- 
shev’s inequality) suffices to obtain an estimate of \E\. 

r TS 7^51 1 

Lemma 1. For every p > Q, we have P \E\ — > rj ■ — < — — . 

L k k i ri^(rs/k) 

Based on a Markov bound we estimate \E\ E'\. Thus we require the following 
Lemma 2. We have E [\E \ E'\] < (r -I- s)rs/k‘^ . 

Although the underlying idea of the proof of Proposition 1 should be clear from this 
section’s discussion, there are technicalities involved in it (see [16] for details). One key 
technical aspect is the use of Talagrand’s inequality [12, Theorem 2.29] which provides 
a concentration for L{E{Kr^s', k)) around one of its medians. The remaining part of 
the argument consists in estimating, based on Theorem 2 and Lemmas 1 and 2, the 
magnitude of one such median. 
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5 The Lower Bound in Theorem 1 



In this section we establish the lower bound on the expectation of k)) and 

the lower tail bound for its distribution. 

Given e, let i5 > 0 be such that (1 — 2S)^ = 1 — e, and let C = C{S) be as in 
Proposition 1 . Fix C > \fC large enough so that 

Let fi{k) = n = \_5k /12\. Proposition 1 applies for k > kg where ko is such that 
n{ko) > C^/ko- It follows that 



L{G)>2{1-2S)^ 



2fi 

^[L{S{Kn^f,-k))\ > (l-2<5)-^.P 

2n / / 

> (1 - 25) • ^ ( 1 - 2exp ( - 

y/k\ V 4(1 + 5) Tfc 



52 






Since, as already mentioned E [L(27(iL„ k)] is subadditive, E [L(L'(iL„ k)/n] is 

non-decreasing. The desired lower bound on the expectation follows. 

Now we establish the lower tail bound. First, we redefine h = \CVk~\ and let 
q = \n/fi\. Moreover, we let G be chosen according to k) and let Gi be 

the subgraph induced in G by the nodes (z— 1) • h + 1, . . . , z • n in each color class, 
z = 1, . . . , g. We observe that L{Gi), . . . , L{Gq) are independent identically distributed 
with distribution S{Kn^n] k) and L{G) > L{G\) + • • • + L{Gq). Let p = E [L{Gi)] 
and t = e(2nj'/k'). Since n < {q + l)h, the lower bound on /z proved above yields that 



L{G)<{1-3s)-^ 



< P 



^ L{Gi) < q^i 



,i=l 



f + (^ - f) . 



An argument similar to the one used above to derive the bound fz > (1 — e)2n/x/fc can 
be used to obtain /z< {1 + e)2h/\/k from Proposition 1 . Let n be large enough so that 
n > n(l + 2£)je. Thus, q > {1 + £)/e and t > eqp/(l + e) > /z. Hence, a standard 
Chernoff bound [12, Theorem 2.1] implies that 



L{G) < (1 - 3e) • ^ 



\ 



< P 



^L(Gi) <qfi-t 



,i=l 






2n 



2(1 + e) yk 



This establishes the sought after lower tail bound. 



6 The Upper Bound in Theorem 1 

We will only discuss the tail bound since L(27(AT„ k)) < n always, and so the claimed 

estimate for the expectation follows from the tail bound. 
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Let e > 0 be fixed. We choose a sufficiently small <5 = <5(£r) > 0, much smaller 
than e. Requirements on S will be apparent from the subsequent proof. 

Henceforth, we fix constants 1/2 < a < P < 3/4 (any choice of a and /3 in the 
specified range would suffice for our purposes). In this section, we will always assume 
that k > ko for a sufficiently large integer kg = ko{e), and that n is sufficiently large 
compared to k: n > k^, say. Note that for n < k^ (and k sufficiently large), the tail 
bound of Theorem 1 follows from Proposition 1 . 

Below, we first introduce the notion of a block partition associated to a “large” planar 
matching. We then classify block partitions into different types. Finally, we show that 
there are not too many different types, and that there is a very small probability that 
a graph chosen according to L{E{Kn,n', k)) is of a given fixed type. A bound on the 
probability of a “large” planar matching occurring immediately follows. This provides 
us with the sought after upper tail hound. 

Block partitions. Let us write Wniax = (!+£)• {2n/Vk) for the upper bound on 
the expected size of a planar matching as in Theorem 1. We also define an auxiliary 
parameter £ = . This is a somewhat arbitrary choice (but given by a simple formula). 

The essential requirements on £ are that £ be much larger than ^/k and much smaller 
than We note that n/£ is large by our assumption n> k^. 

Let M he. a. planar matching with mmax edges on the sets A and B,\A\ = \B\ = n. 
We define a partition of M into blocks of consecutive edges. There will be roughly n/£ 
blocks, each of them containing at most 



^max 



1 i 

— ' • TTlmax 

6 n 



edges of M. So Cmax is of order £/Vk, which by our assumptions can be assumed to 
be larger than any prescribed constant. Moreover, we require that no block is “spread” 
over more than £ consecutive nodes in A or in H. 

Formally, the ith block of the partition will be specified by nodes , a' G A and 
bi,b[ G B; Qibi G M is the first edge in the block and a'&' G M is the last edge (the 
block may contain only one edge, and so Qibi = is possible). The edge aibi is the 
first edge of M, and is the edge of M immediately following a'6'. Finally, 

given Qibi, the edge o', 6' is taken as the rightmost edge of M such that 

- the ith block has at most Cmax edges of M, and 

- a[ — Gi < £ and b'^ — bi < £ (here and in the sequel, with a little abuse of notation, 
we regard the nodes in A and those in B as natural numbers 1,2, ... ,n, although 
of course, the nodes in A are distinct from those of B). 

Let q denote the number of blocks obtained in this way. It is easily seen that q = 0{n/€). 
A block partition is schematically illustrated in Fig. 1 . 

Counting the types. Let be the number of edges of M in the ith block. Let us call 
the 5g-tuple T = {ai,a'i,bi,b[,ei, . . . ,aq,a'g,bq,b'g, Cq) the type of the block partition 
of M, and let us write T = T{M). Let T denote the set of all possible types of block 
partitions of planar matchings as above. 
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Fig. 1. A block partition. 



The probability of a matching with a given type of block partition. Relying on the 
fact that blocks are small graphs similar to those dealt with in Section 4, we show that for 
every fixed type T, the probability that our random choice contains a planar matching 
of size rrijnax with that type of block partition is very small. 



Lemma 4. Let n and k be as above. For any given type T G T, the probability pt that 
the random graph k) contains a planar matching M with TOmax edges and 

with T(M) = T satisfies 



Pt < exp ( —ce^S 



n \ 

"T^J 



with a suitable absolute constant c > 0. 



Proof of Theorem 1. We have 

^)) ^ ^max] ^ E Pt < |T| ■ m^xpT’ . 

Ter 

From Lemmas 3 and 4 follows the sought after estimate. □ 



7 Extensions 

One can prove results for the Erd^s model analogous to those obtained in previous 
sections (essentially, k is now replaced by 1 /p): 

Theorem 3. For every £ > 0 there exist constants po G (0, 1) and C such that for all 
p < Po and all n with riy/p > C we have 

(1 - £) • 2n • < E [L(G(iL„,„; p))] < (1 + £) • 2n • . 

Moreover, there is an exponentially small tail bound; namely, for every £ > 0 there exists 
c > 0 such that for p and n as above, 

P[|L(G(iL„,„;p)) - 2n^\ > e2n^ < . 
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Subadditivity arguments yield that E p)) /n] converges to a constant Ap as 

n — 7> oo. The previous theorem thus implies that Apj^p — >■ 2 as p — 0. Also, similar 
results hold for the G{Kr^s',P) model as those derived for E{Kj.^s', k). Specifically, 

Proposition 2. For every (5 > 0, there exists a (large) positive constant C such that: 

(i) Ifrs > C /pand{r+s)^/rs < S/6p^^^, thenwithrriu = m„(r, s) = 2(l+<5)^rsp, 
for all t > 0, 

P[L{G{Kry,p)) >mu + t]< . 

(ii) Ifrs > Gjp and and r + s < S/6p, then with as above and mi = mi{r, s) = 
2(1 — 5)y/rsp, for all t > 0, 

P[L{G{Kr,s;p)) 26-*"/®"^“ . 

In [13], Johansson implicitly considers a model somewhat related to the G{Kn^n] p) 
model. Specifically, a distribution G*(Kn^n',p) over weighted instances of The 

weight of each edge is a geometrically distributed random variable taking the value 
A; G N with probability (1 — and the edge weights are mutually independent. 
Denoting the maximum weight planar matching of an instance drawn according to 
G* {Kn,n',p) by L{G*{Kn,n]p)), Johansson’s result [13, Theorem 1.1] says that for 
all pG (0,1), 

lim -•E[L(G*(iT„,„;p))] = --(l + v^r^)2. 

n->oo n p 

Note that an instance G of G{Kn,n',p) can be obtained from one drawn according to 
G*{K„ n;p) by including in G only those edges of with nonzero weight. Hence, 

E[L(G(iT„,„;p))j < E[L(G*(iT„.„;p))j . 

It follows that Ap < (1 + jp for allp G (0, 1). We shall see below that known 

results imply a much stronger bound on Ap for not too large values of p. 

Gravner, Tracy and Widom [10] consider processes associated to random (0, 1)- 
matrices where each entry takes the value 1 with probability p, independent of the 
values of other matrix entries. In particular they study a process called oriented digital 
boiling (ODB) and analyze the behavior of a so called height function which equals, in 
distribution, the longest sequence of positions in a random (0, l)-matrix of size 

nxn which have entry 1 such that the ifs are increasing and the jfs are non-decreasing. 
In contrast, L{G{Kn^n',p)) equals in distribution the longest such sequence with both 
i/’s and j/’s increasing. This latter model is referred to as strict oriented digital boiling 
in [10], but no results are claimed for it. Clearly, an ODB process dominates that of a 
strict ODB process. Hence, [10, §3, (1)] implies that for any p < 1/2, 

Ap<Kp-.= lim - • E[T(G(AT„,„;p))] = 2 -\/p(T^^, 

n^oo n 

which in turn implies that limsupp_,.Q Apj-^ < 2. Nevertheless, our derivation of this 
latter limit value is elementary in comparison with the highly technical nature of [10]. 
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Abstract. We define the universal type class of an individual sequence 
x", in analogy to the classical notion used in the method of types of 
information theory. Two sequences of the same length are said to be of 
the same universal (LZ) type if and only if they yield the same set of 
phrases in the incremental parsing of Ziv and Lempel (1978). We show 
that the empirical probability distributions of any finite order k of two 
sequences of the same universal type converge, in the variational sense, 
as the sequence length increases. Consequently, the logarithms of the 
probabilities assigned by any fc-th order probability assignment to two 
sequences of the same universal type converge, for any k. We estimate 
the size of a universal type class, and show that its behavior parallels 
that of the conventional counterpart, with the LZ78 code length playing 
the role of the empirical entropy. We present efficient procedures for 
enumerating the sequences in a universal type class, and for drawing a 
sequence from the class with uniform probability. As an application, we 
consider the problem of universal simulation of individual sequences. A 
sequence drawn with uniform probability from the universal type class 
of x" is a good simulation of x" in a well defined mathematical sense. 



1 Introduction 

Let A be a finite alphabet of cardinality |A| > 2. We denote by the sequence 
XjXj+i . . . Xk, Xi G A, j < i < k, with the subscript j sometimes omitted from x^ 
when j = 1. If j > k, Xj = A, the null string. The terms “string” and “sequence” 
are used interchangeably; we denote by |w| the length of a string w G A*, and 
by vw the concatenation of v,w G A*. 

The method of types [1,2,3] has proven very useful in deriving results in 
source and channel coding. Although often discussed in the memoryless setting, 
the method generalizes readily to wider classes of parametric probability distri- 
butions on sequences over discrete alphabets. Specifically, consider a class P of 
probability distributions P® on A", n > 1, parametrized by a finite-dimensional 
vector 0 of real- valued parameters. The type class of x" with respect to P is 
the set of all sequences y” such that P®(x”) = P®(y”) for all admissible values 
of the parameter vector 0 . Generally, type classes are characterized by a set of 
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empirical statistics, whose structure is determined by the class P. For example, 
in the case where the components of a;" are independent and identically dis- 
tributed (i.i.d.), and the class P is parametrized by the |A| — 1 free parameters 
corresponding to the probabilities of individual symbols from A, the type class 
of x" consists of all sequences that have the same single-symbol empirical dis- 
tribution as x" [1]. Type classes for families of memoryless distributions with 
more elaborate parametrizations are discussed in [4] . In the case of finite memory 
(Markov) distributions of a given order k, empirical joint distributions of order 
k + 1 determine the type classes [3] . 

In all the cases mentioned, to define the type classes, one needs knowledge on 
the structure (e.g., number of parameters) of P. In this paper, we define a notion 
of universal type that does not require such knowledge. The universal type class 
of x" will be characterized, as in the conventional case, by the combinatorial 
structure of x". Rather than explicit symbol counts, however, we will base the 
characterization on the data structure built by a universal data compression 
scheme, namely, the variant of Lempel-Ziv compression described in [5], often 
referred to as LZ78.^ 

The incremental parsing rule [5] parses the string x” as x" = PqPj^P 2 . . . pAx, 
where Pg = A, and the phrase p^, 1 < f < c, is the shortest substring of x” 
starting at the point following p^_i such that p^ ^ Pj for all j <i (xi is assumed 
to follow pg). The substring t^,, referred to as the tail of x”, is a (possibly empty) 
suffix for which the parsing rule was truncated due to the end of the string x". 
Conversely, we refer to the prefix P 1 P 2 • ■ - Pc as the head of x”. Notice that all 
the phrases are distinct, and must be equal to one of the phrases, for otherwise 
an additional phrase could have been parsed. Clearly, the number of phrases is a 
function c(x”) of the input sequence, but we shall omit its argument when clear 
from the context. 

Let ‘d>x« = {Pi,P 2 ) ■ • ■ 5 Pc} denote the set of phrases in the parsing of x". 
We define the universal (LZ) type class (in short, UTC) of x", denoted Tx^, as 
the set 

r.n = {y"e Al" : =<!>,„ }. 

For arbitrary strings u^, u™, m > k > 1, let 

= \{i : l<i<m-k+l}\ 

denote the number of (possibly overlapping) occurrences of in x™. Denote 
the empirical (joint) distribution of order fc, 1 < A: < n, of x" by with 

P^n\u^) = iV(w^,x”)/(n — fc-l-1), G A'^. A fundamental property of the UTC 
is given in the following theorem, proved in Section 2. 



^ Similar notions of universal type can be defined also for other universal compres- 
sion schemes, e.g. Context [6]. Presently, however, the LZ78 scheme appears more 
amenable to a combinatorial characterization of its type classes. 
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Theorem 1. Let a;" be an arbitrary sequence of length n, and k a fixed positive 
integer. If y” G then, for all G , we have‘s 

P^n\u'^) — Pyn\u'^) = o(l) as n ^ oo. (1) 

A fc-th order (finite-memory) probability assignment Qk is defined by a set of 
conditional probability distributions Qk{u’^~^^\ui), u*+^gA^+^, and a distribu- 
tion Qk{xi) on the initial state, so that Qk(xf) = Qki.xX)]Xi=k+iQki.^i\^P\)- 
In particular, Qk could be defined by the fc-th order approximation of an ergodic 
measure [2]. The following is an immediate consequence of Theorem 1. 

Corollary 1. Let a;" and y" be sequences such that y" G Tx-^- Then, for any 
nonnegative integer k, and any k-th order probability assignment Qk such that 
Qk(x^) p 0 and Qpyp p 0, we have ^ log = o(l) as oo. 

Theorem 1 and Corollary 1 are universal analogues of well known properties 
of classical types. In a classical type class, all the sequences in the class have 
the same empirical distribution (relative to the model class defining the type, 
e.g., /c-th order joint empirical distributions for k — 1st order finite-memory), 
and they are assigned identical probabilities by any distribution from the model 
class. In a sense, both properties mean that sequences from the same type class 
are statistically “indistinguishable” by distributions in the model class. In the 
universal type case, “same empirical distribution” and “identical probabilities” 
are weakened to asymptotic notions, i.e. “equal in the limit,” but they hold 
for any model order. The weakened “indistinguishability” is the price paid for 
universality. 

For simplicity, we will focus on the case of binary sequences, i.e., A = {0, 1}. 
All the principles and main results presented carry without difficulty (albeit 
with increased notational complexity) to other finite alphabets. The rest of this 
extended summary is organized as follows. In Section 2 we prove Theorem 1, 
and we analyze the structure and size of UTCs. In Section 3, we present a 
procedure for drawing a random element with uniform probability from a UTC, 
and describe an application of UTCs to the universal simulation of individual 
sequences. Proofs for all the results in this extended summary are presented 
in [7]. The full version also discusses additional properties of universal types, 
such as the number of UTCs for a given sequence length n, which, contrary to 
the conventional finite-parametric case, is not polynomial in n. These discussions 
are omitted here due to length constraints. 

2 The Universal Type Class of x'^ 

If y” G Tx", then y” parses into the same set of phrases as a;". However, the 
phrases need not (and, except when y" = x", will not) occur in the same order 

^ The asymptotic language in this statement, and others in the sequel, should be 
interpreted as relating to any infinite sequence of sequences x”, one for each length 
n, and not necessarily related by prefix relations. 
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in j/” as they do in a;" . Also, the tail of y” may be any phrase t j, G , of length 
|ty| = Itj-I = n — nr(a;”), where 

nr(a;") = |pi| + |p 2 | + --- + |pj. (2) 



Example 1. Consider the string x® = 10101100, with n = 8. The incremental 
parsing for x® is 1,0,10,11,00, with c = 5, nrix^) = 8, and a null tail. The 
sequence j/® = 01001011 is parsed into 0,1,00,10,11, defining the same set of 
phrases as x®. Thus, x® and y® are in the same UTC. 

2.1 Proof of Theorem 1 

Proof. We claim that the following inequalities hold for x": 

C C 

^ A^(u'",Py) < fV(u'",x”) < ^iV(M'=,Py) + {k- l)c+ |t„|. (3) 

a=i 

The first inequality follows from the fact that the phrases are distinct, and 
they parse the head of x". The second inequality follows from the fact that an 
occurrence of is either completely contained in a phrase of the parsing, or it 
spans a phrase boundary, or it is contained in the tail of x". To span a boundary, 
an occurrence of must start at most fc — 1 locations before the end of a phrase. 
Clearly, the inequalities in (3) hold also for any sequence y” G 7^»*, since they 
all define the same set of phrases as x”, and have tails of the same length. Thus, 
it follows from (3) that 

|iV(M^x")-fV(u^y”)| <(fc-l)c+|t,|, Vy"GT.n. (4) 

It is well known (cf. [5,2]) that for the LZ78 incremental parsing, the number of 
phrases satisfies c < n/(logn — o(logn)), and the length f of any phrase (and, 
thus, also of |ta;|) satisfies t < V^—o{y/n). Hence, we have (A:— l)c+ |ta,| = o(n) 
for fixed k, and the claim of the theorem follows from (4), after normalization 
by n — fc + 1 . □ 

2.2 The Size of the Universal Type Class 

The set of phrases in the incremental parsing of x" is best represented by means 
of a rooted, ordered binary parsing tree where each node represents a phrase, 
and each branch is labeled with a binary symbol. The phrase associated with a 
node is the concatenation of the edge labels on the path from the root (associated 
with A) to the node. The number of nodes in the tree is c(x") + 1, and its path 
length [8] is nrix'^) as defined in (2), which depends only on the tree, a fact 
we will emphasize by omitting the argument x”. All the sequences in a UTC 
share the same parsing tree T, which can serve as a canonical representation 
of the type class. In general, a complete specification of the UTC requires also 
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the sequence length n, since the same parsing tree T might result from parsing 
sequences of different lengths, due to possibly different tail lengths. For a given 
tree T, n can vary from ut to ut + ^max; where £max is the maximal depth 
of a leaf of T. When nr = n (i.e., x" has a null tail), we say that Tx^ is the 
natural UTC associated with Tx ^ . If T is an arbitrary parsing tree, we denote 
the associated natural UTC by T[T), without reference to a specific string in the 
type class. We call a valid pair (T,n), the universal type (UT) of the sequences 
in the corresponding UTC. When n is not specified, the natural UT is assumed. 
An example of a parsing tree, corresponding to the string of Example 1, is 
shown in Figure 1. In the example, n = ut, and, thus, T{T) = Tx«- 




Fig. 1. Parsing tree for a:® = 10101100 



Each sequence in the UTC of x” is determined by some permutation of 
the order of the phrases in (Px”^- Not all the possible permutations are allowed, 
though, since a phrase Pj that is a prefix of another phrase must always 
precede p^ . Thus, only permutations that respect the prefix partial order are 
valid. Notice that, in general, the parsing tree is not necessarily complete.^ We 
call a node (or the corresponding phrase) a bridge if it has exactly one child. For 
example, the phrase ‘0’ in the example of Figure 1 is a bridge. 

Let T° and denote the subtrees of Tx^ rooted at the respective children 
of the root of Tx«. Each subtree, in turn, defines the UT of a subsequence of 
x"; namely, T“ defines the UTC 7{T“) of the subsequence resulting from the 
concatenation of all phrases in <Px" starting with the symbol a G {0, 1}. Notice 
that either subtree might be missing if the root of Tx^ is a bridge or a leaf. 
When both children are missing, we have the trivial case where x" = A, and 
|7^„| = 1. We denote the number of nodes in T“ by Ca + 1, a = 0, 1. In analogy 
with the notation for Tx^, Ca denotes the number of non-root nodes in the 
respective subtree. When is missing, we set Ca = —1, and \Ta\ = 1. We have 
c(x") = Cq + Ci + 2. a valid permutation defining a sequence j/" G Tx-^ must result 
from a valid permutation of the phrases in and a valid permutation of the 
phrases in . The resulting ordered sub- lists of phrases can be freely interleaved 
to form a valid ordered list of phrases for y", since there is no order constraint 

® We call a binary tree complete if every node has either two children or none. There 
is a remarkable lack of terminology consensus for this notion in the literature; see a 
footnote in [9] for a sample of different terms authors have used. 
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between phrases in different subtrees. The number of possible interleavings is, 
therefore, 

which is the well known formula for the number of ways to merge an ordered 
list of size cq + 1 with one of size ci + 1, while preserving the respective orders. 

Given the sub- lists and their interleaving, to completely specify y", we must 
also define its tail, which can be any phrase of length |t 2 ,| (at least one such 
phrase exists, namely, t^, itself). Therefore, we can write the following recursion 
for the size of 7^n : 

\%r.\ = |7IT0)| • |7lTi)| • M(co,ci) • m(|t,|), (5) 

where m(|ta;|) denotes the number of nodes at level It^,! in Notice that 
when (5) is used recursively, all recursion levels except the outermost deal with 
natural UTCs. Therefore, a nontrivial term m(|ta,|) occurs only at the outermost 
application of (5). 

Example 2. For the tree T in Figure 1, we have Co=l, Ci=2, |ta,|=0. Therefore, 

|7ir)| = |7irO)||7iri)|Q = (i)-(i.i - (^2^). Q = 20, 

which is the number of ways to merge the list [0, 00] with either the list [1, 10, 11] 
or the list [1, 11, 10] while preserving the order of each list. 

Let T denote the tree obtained from a parsing tree T by collapsing paths of 
the form vi V 2 us, where V 2 is a bridge node, to single edges vi I's, or, 
if the root is a bridge node, eliminating the root and its outgoing edge (with the 
node at the end of this edge becoming the new root) . Bridge nodes are eliminated 
sequentially, one bridge at a time, until none are left. By construction, T is a 
complete tree. 

Lemma 1. We have \7{T) \ = ]T(T)j. 

We denote by cb{x^) the number of bridges in the parsing tree of x". 

Theorem 2. Let x" be an arbitrary sequence of length n, and c = c(x") — 
CB(a;"). Then, 

(1 - /3)clogc < log ]%"]< clog c, (6) 

where, as n ^ oo, f3 can be bounded away from zero only if is bounded away 
from one. 

The bounds are expressed in terms of c rather than c, due to Lemma 1. The 
upper bound is a straightforward consequence of the fact that the UTC size for 
a sequence with c phrases is upper-bounded by c! . A direct combinatorial proof of 
the lower bound is presented in [7]. However, the necessity of a lower bound of this 
kind follows from the optimality of the LZ78 algorithm, and the fact that UTs can 
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be used to define an asymptotically optimal enumerative coding procedure [10] 
for sequences of length n, in which the code length assigned to x” is log \Tx^ \ + 
o(n). Theorem 2 shows another fundamental parallel between universal types and 
conventional types: the size of a conventional type is known to be 
where H(x^) denotes the empirical entropy rate of a;" with respect to the model 
class defining the types (cf. [2,3]). Theorem 2 states that a similar statement 
is true for UTs, with the normalized LZ78 code length (clogc)/n playing the 
role of the empirical entropy rate. Notice that sequences for which log c/ log n is 
bounded away from one have vanishing LZ78 compressibility rate, so the analogy 
does not break when (6) is not tight. 



3 Random Sequences from Universal Types 

The recursion (5) is helpful for deriving efficient procedures for enumerating U", 
and for drawing a random sequence from it with uniform probability. We present 
the random selection procedure here. Enumeration procedures are presented 
in [7], and will provide an alternative way of drawing a sequence from the UTC 
with uniform probability. 



3.1 Random Selection Algorithm 

The algorithm in Figure 2 draws a random sequence from U*i. In the algorithm, 
we label nodes of T with their associated phrases, we mark nodes as used or 
unused, and we denote by U{v) the number of unused nodes in the subtree 
rooted at v. For a node v, and b G {0, 1}, we say that the path from v to vb is 
blocked if either there is no node labelled vb in the tree, or U{vb) = 0. 

Theorem 3. The algorithm in Figure 2 outputs a sequence drawn with uniform 
probability from U". It requires a total of log ]U>»1 + o(n) random bits to do so. 

Table 1 shows the steps taken in drawing a random sequence from the UTC 
associated with the sequence cc® and the parsing tree of Figure 1. The output of 
the run is the string y® = 0 001 1011, which is in Us . Checking the probabilities 
of random choices made in Step 5, we observe that the execution path taken had 
overall probability | | | consistent with our previous determination of 

jUs I = 20 and a uniform distribution. 



3.2 Application to Universal Simulation of Individual Sequences 

Informally, given an individual sequence a;", we call a random sequence y" a 
“good simulation” of a;" if the following conditions hold: 

1. y" is “statistically similar” to a;"; 

2. given that j/" satisfies Condition 1, it has the maximum possible entropy. 
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Input: x", parsing tree T = Ten. 

Output: Sequence j/", drawn with uniform probability from Tic". 



1. Mark the root of T as used, and all other nodes as unused. 

2. Set V X (the root). If U{v) = 0, go to Step 6. 

Otherwise, proceed to Step 3. 

3. If u is unused, output v as the next phrase of y" , mark v as used, 
and go to Step 2. Otherwise, proceed to Step 4. 

4. If the path from v to vO is blocked, set v <— v 1 and go to Step 3. 

Else, if the path from v to vl is blocked, set v <— v 0 and go to Step 3. 
Otherwise, proceed to Step 5. 

5. Draw a random bit b with Prob(6 = 1) = uiv’oj+uiv i) • 

Set v-i^vb, and go to Step 3. 

6. Output a random phrase of length |ta;| as the tail of y" . Stop. 



Fig. 2. Algorithm for drawing a random sequence from T^n 




Fig. 3. Texture simulation 



Condition 1 is stated in a purposely vague fashion, as the desired similarity 
criterion may vary from setting to setting. In [4], for example, x" is assumed 
to have been emitted by a source from a certain parametric class, and a strict 
criterion is used, where y” must be assigned exactly the same probability as x” 
by all sources in the class. We will be satisfied with a less stringent requirement 
(since, among other things, we do not assume x" was generated by a probabilistic 
source), as given by the property stated in Theorem 1. Condition 2, on the other 
hand, is necessary to avoid, in the extreme case, a situation where x” (which 
certainly satisfies Condition 1) is returned as its own “simulation.” We wish to 
have as much variety as possible in the space of simulations of x". It is proved 
in [7] that a simulation procedure based on drawing y” uniformly at random 
from Tx^ satisfies both conditions in a well defined mathematical sense. 
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Table 1. Execution of the random selection algorithm 



Step 


V 


status 


U{vQ),U{vl) 


choice 


Prob 


output 


2 


root 












3 


root 


used 










4 


root 




2,3 








5 


root 




2,3 


0 


2/5 




3 


0 


unused 








0 


2,3,4 


root 




1,3 








5 


root 




1,3 


0 


1/4 




3 


0 


used 


1,0 








4 


0 




1,0 


0 






3 


00 


unused 








00 


2,3,4 


root 




0,3 


1 






3 


1 


unused 








1 


2,3,4 


root 




0,2 


1 






3,4 


1 


used 


1,1 








5 


1 




1,1 


0 


1/2 




3 


10 


unused 








10 


2,3,4 


root 






1 






3,4 


1 






1 






3 


11 


unused 








11 


2 


root 




0,0 








6 


stop 1 



The simulation procedure outlined above was tested on some binary textures. 
For the example in Figure 3, a 1024 x 1024 binary texture was generated, and 
scanned with a Peano plane-filling scan, to produce a binary sequence x" of 
n = 2^° samples. The sequence was then “simulated” by generating a uniform 
random sample y" from . Finally, the sequence y" was mapped back, reversing 
the same Peano scan order, to a 1024 x 1024 image. The left half of Figure 3 
shows a 512 x 512 patch of the original texture, while the right half shows a 
512 X 512 patch of the simulated one (the smaller patches were used to comply 
with page size limitations without sub-sampling or altering the visual quality of 
the images). It is evident from the figure that the right half indeed “looks like” 
the left half, and the seam between the images appears unnoticeable. Yet, the 
right half is completely different from the left half, and was selected from a very 
large class of possible simulation images. In fact, the size of 7^n in this example 
was estimated using the recursion (5), resulting in log |7^>«| « 109,700. 



Acknowledgments. Thanks to Erik Ordentlich, Wojciech Szpankowski, Al- 
fredo Viola, Marcelo Weinberger, and Tsachy Weissman for very useful discus- 
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Abstract. Separating codes, initially introduced to test automaton, 
have revived lately in the study of fingerprinting codes, which are used 
for copyright protection. Separating codes play their role in making the 
hngerprinting scheme secure against coalitions of pirates. We provide 
here better bounds, constructions and generalizations for these codes. 



1 Introduction 

Separating codes were introduced in 1969 and have been the topic of several 
papers with various motivations. Many initial results are due to Sagalovich; 
see [3] for a survey, and also [2,5]. New applications of separating codes have 
appeared during the last decade, namely traitor tracing and fingerprinting. 

Fingerprinting is a proposed technique for copyright protection. The vendor 
has some copyrighted work of which he wants to sell copies to customers. If he is 
not able to prevent the customer from duplicating his copy, he may individually 
mark every copy sold with a unique fingerprint. If an illegal copy (for which the 
vendor has not been paid) subsequently appears, it may be traced back to one 
legal copy and one pirate via the fingerprint. A pirate is here any customer guilty 
of illegal copying of the copyrighted work. 

Traitor tracing is the same idea applied to broadcast encryption keys. E.g. 
the vendor broadcasts encrypted pay-TV, and each customer buys or leases a 
decoder box to be able to decrypt the programmes. If the vendor is not able to 
make the decoder completely tamperproof, he may fingerprint the decryption 
keys which are stored in the box. 

The set of fingerprints in use, is called the fingerprinting code. Separating 
codes are used in the study of collusion secure fingerprinting codes. If several 
pirates collude, they posess several copies with different fingerprints. By compar- 
ing their copies, they will find differences which must be part of the fingerprint. 
These identified “marks” may be changed to produce a false fingerprint. A col- 
lusion secure code should aim to identify at least one of the pirates from this 
false fingerprint. 



M. Farach-Colton (Ed.): LATIN 2004, LNCS 2976, pp. 322-328, 2004. 
© Springer- Verlag Berlin Heidelberg 2004 
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We shall introduce two useful concepts regarding collusion secure codes. If 
the code is t-frameproof, it is impossible for any collusion of at most t pirates 
to produce a false fingerprint which is also a valid fingerprint of an innocent 
user. In other words, no user may be framed by a coalition of t pirates or less. A 
t-frameproof code is the same as a {t, l)-separating code, which will be defined 
formally in the next section. 

If the code is t-identifying, the vendor is always able to identify at least one 
pirate from any coalition of size at most t, given a false fingerprint created by 
the coalition. A first step towards identification is (t, t)-separation (see, e.g. [4]), 
which we study and generalize here. 

2 Definitions 

For any positive real number x we denote by [x] the smallest integer at least 
equal to x. Let A be an alphabet of q elements, and A” the set of sequences 
of length n over it. A subset C C A” is called an {n,M)q or {n, M)-code if 
\C\ = M. Its rate is defined hy R = (log^ M)/n. For any x G A”, we write Xi 
for the z-th component, so that x = (xi,X 2 , . . . ,x„). The minimum Hamming 
distance between two elements (codewords) of C is denoted by d{C) or d, and 
the normalised quantity d/n by 6. 

Consider a subset C C C. For any position i, we define the projection Pi{C) = 
UaGc{®*}- The feasible set of C is 

F(C) = {xG A":Vz,a;*GP,(C)}. 

If C is the fingerprints held by some pirate coalition, then F{C) is the set 
of fingerprints they may produce. If two non-intersecting coalitions can produce 
the same descendant, i.e., if their feasible sets intersect, it will be impossible to 
trace with certainty even one pirate. This motivates the following definition. 

Definition 1. A code C is {t,t')separating if, for any pair (T,T') of disjoint 
subsets of C where |T| = t and \T'\ = t' , the feasible sets are disjoint, i.e. 

F(T) nP(T') = 0. 

Such codes are also called separating systems, abbreviated by SS. 

Since the separation property is preserved by translation, we shall always 
assume that 0 G C. The separation property can be rephrased as follows when 
q = 2: For any ordered t + t'-tuple of codewords, there is a coordinate where the 
t + t'-tuple (1..10..0) of weight t or its complement occurs. 

Given a (t, t')-configuration (T, T') we define the separating set 0{T,T') to 
be the set of coordinate positions where (T, T') is separated. Let 6{T,T') := 
ffO{T,T') be the separating weight. Clearly 6{T,T') > 1 is equivalent with 
(T, T') being separated. The minimum (t, t')-separating weight 9t,t'{C) is the 
least separating weight of any (t, F)-configuration of C. We abbreviate 0i,i(C) 
to 9i{C) or 6i. Clearly Oi{C) = d{C). The minimum separating weights have 
previously been studied by Sagalovich [3]. 
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3 Bounds on (t, 1) Separating Codes 

The case t' = 1 corresponds to “frameproof” codes introduced in [1]. Korner 
(personal communication) has a simplified proof of i? < 1/2 for (l,2)-separation 
in the binary case. We generalize it to any t and q, and for bounded separating 
weight riT. 

A {t,T)-coverfree code is a code with {t, l)-separating weight at least equal 
to rn. Their study in [11] and [9] is motivated by broadcast encryption. 

Partition {1,2, ..n} into t almost equal parts Pi, . . . , P* of size approximately 
n/t. Say a codeword c is isolated on Pi if no other codeword projects on Pi on 
a vector located at distance less than {n/t)T from c. Denote by Ui the subset of 
codewords isolated on P^. 

Lemma 1. If C is {t,T)-coverfree, then every codeword c of C is isolated on at 
least one Pi. 

Proof: Suppose for a contradiction that there is a codeword Cq which is 
not isolated. Let Ci be a codeword which is at distance less than {n/t)r when 
projected onto Pi, for z = 1, . . . ,t. Now Cq is separated from |ci, . . . , Ct} on less 
than {n/t)r coordinates per block, or at most nr — t coordinate positions total. 
This contradicts the assumption on the separating weight r. 

If we let T tend to zero, we get an upper bound on the size of (t, l)-separating 
codes, which was found independently in [13] and [12]. The proofs are essentially 
the same as the one presented here. 

Theorem 1. If C is {t,T)-coverfree, then \C\ < , 

For constant t, this asymptotically gives a rate R < (1— r) ft when n increases. 
A lower bound on the rate can now be obtained by invoking a sufficient condition 
for C to be ft, r)-coverfree, based on its minimum distance d'.td>ft—l + r)n. 
This is proved in a more general form in Proposition 1. Using algebraic-geometric 
(AG) codes [7] with > f“^(l — r) and R « 1 — 5— l/(g^/^ — 1) gives the following 
asymptotically tight (in q): 

Theorem 2. For fixed t and large enough q, the largest possible rate of a q-ary 
family of ft,T)-coverfree codes satisfies R = t~^{l — r)(l -I- o(l)). 

4 Large Separation 

Definition 2. A code C of length n is ft, t' , T)-separating if, for any pair {T, T') 
of disjoint subsets of C where jT] = t and \T'\ = t' , 9{T,T') > rn. 



Proposition 1. A code with minimum distance d is ft,t' , t)~ separating if 



tt'd > ftt' -1-1- r)n. 
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Proof: Consider two disjoints sets T and T' of sizes t and t' respectively and 
count the sum S of pairwise distances between them: on one hand, S > tt'd > 
{tt' — 1 + r)n. Computing S coordinatewise now, we get that the contribution 
to S of at least rn coordinates must be greater than tt' — 1, i.e. tt' . Thus, these 
coordinates separate T and T' . 

To construct infinite families of separating codes over small alphabets, we 
can resort to the classical notion of concatenation. 

Definitions (Concatenation). Let C\ he a {n\,Q)q and let C 2 be an 
(n 2 ,M)g code. Then the coneatenated code C\ o C 2 is the {nin 2 , M)q code ob- 
tained by taking the words of C 2 and mapping every symbol on a word from 
Cl. 

The following result is an easy consequence of the definition. 

Proposition 2. Let Ji he a {ni,M)M' code with minimum separating weight 
JO and let I 2 be a {n 2 ,M')q code with minimum separating weight Then 
the concatenated code T := I 2 ° A has minimum separating weight 9t t> = ■ 

C- 

We shall illustate the concatenation method with q = 2,t = 2,t' = 1 in the 
next section. 

5 The Binary Case 

5.1 (2, l)-Separation 

In [8], it was pointed out that shortened Kerdock codes K'{m) for m > 4 are 
(2, l)-separating. Take an arbitary subcode of size 11^ in K'{4:) which is a (15, 2^) 
(2, 1)-SS. Concatenate it with an infinite family of algebraic-geometry codes over 
GF(lA) (the finite field with 11^ elements) with S > 1/2 (hence (2, l)-separating 
by Proposition 1) and R « 1/2 — 1/11 [7]. After some easy computations, this 
gives: 

Theorems. There is a constructive asymptotic family of binary (2,1)- 
separating codes with rate R = 0.1845. 

This can even be refined if we concatenate with the codes contained in the 
following proposition from [10]. 

Proposition S. Suppose that q = with p prime, and that t is an integer 
such that 2 < t < y/q— 1- Then there is an asymptotic family of (t, l)-separating 
codes with rate 

t yg-1 f(V9-l)' 

Remark 1. If we use the Xing’s codes ([10]), we get an improved rate of i? « 
0.2033, but at the expense of constructivity. 
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5.2 A Stronger Property 

Definition 4 (Completely Separating Code). A binary code is said to he 
completely separating ({t,t')~CSS) if for any set ordered set oft + t' code- 
words, there is at least one column with 1 in the t upper positions, and 0 else- 
where, and one column with 0 in the t upper positions and 1 in the t' lower 
ones. 

We define Rss{t,t') as the largest possible asymptotical rate of a family of 
{t,t')-SS, and similarly RcssitR') for {t,t')-CSS. We clearly have 

Rss{t,t') > Rcss{t,t') > -Rss{t,t'). (1) 



5.3 Improved Upper Bounds on (t, t)-Separating Codes 

Theorem 4. A (t,t) -separating {Oq,M,0i) code with separating weights 
( 6 * 1 , ... ,0t) gives rise to a {i,i)-CSS — 2i + 2f,26*t+i_i) with complete- 

separating weight 9i, for any i < t. 

Proof: Consider a pair of ft — i)-tuples of vectors which are separated on 9t-i 
positions. Pick any vector c from the first (6 — i)-tuple and replace the code C 
by its translation C — c. Thus all the columns which separates the two tuples 
have the form (0 ... 01 ... 1). 

Now consider any two i-tuples of vectors. Coupling each z-tuple with a (t — i)- 
tuple, we get two 6-tuples which must be separated on 9t positions, i.e. the two 
i-tuples must have at least 0* columns of the form (0 ... 01 ... 1). Now, observe 
that we can swap the two (6 — i)-tuples, and the two resulting 6-tuples are still 
separated. This guarantees at least 0* columns of the form (1 ... 10 ... 0). 

Deleting all the columns where the two (6 — i)-tuples are not separated, and 
the words of these two tuples must this leave us with an (z, z)-CSS with complete- 
separating weight 9i and parameters {9t-i, M — 2t -\- 2i, 29t+i-i), as required. 

Theorem 5. Any completely {t,f} -separating {9 q, M,29i) code with complete- 
separating weights (9i,...,9t) gives rise to a completely (i,i) -separating 
{9t-i, M -2t-\-2i, 29t+i-i) code with complete-separating weight 9i, for any i < t. 

This is proved in the same way as the previous theorem. 

Theorem 6. For any (t,t)-CSS, the rate Rt satisfies 



Rt < R{‘^Rt/Rt-i)j 



where R{S) is any upper bound on the rate of error- correcting codes in terms of 
the normalised minimum distance, and Rt-i is the upper hound on the rate of 
any (6—1,6— 1)-CSS. 
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Proof: Let Ct-i be the 1)-CSS which exists by Theorem 5, and let 

Rt-i be its rate. We have that 



., = 2 =2 



9i logM 6*1 



6*0 log M 



— 2Rt/ Rt-i- 



Now, obviously Rt < R{St), which is decreasing in St, and this gives the result. 
With a completely analogous proof, we also get the following. 



Theorem 7. For any {t,t)-SS, the rate R satisfies 



R < R{R/ Rt-i), 



where R{5) is any upper bound on the rate of error- correcting codes in terms of 
the normalised minimum distance, and Rt-i is the upper hound on the rate of 
any {t — l,t — 1)-CSS. 



Table 1. Rate bounds on CSS and SS. 





1 Bound 1 1 


D’yachkov et al. 


1 Bound 2 1 


it,t) 


CSS rate 


SS rate 


CSS rate 


CSS rate 


SS rate 


(1,1) 


1 


1 


1 


1 


1 


(2,2) 


0.1712 


0.2835 


0.161 


- 


- 


(3,3) 


0.03742 


0.06998 


0.0445 


0.0354 


0.0663 


(4,4) 


0.008843 


0.01721 


0.0123 


0.00837 


0.0163 


(5,5) 


0.002156 


0.004261 


0.00333 


0.00204 


0.00404 



Setting equality in the bounds and solving, we get the upper bounds given as 
‘Bound 1’ in Table 1. Comparing with the CSS bounds of [6] shows an improve- 
ment from (3,3)-CSS onwards. However, [6] has a good bound on (2,2)-CSS, 
used as a seed for the recursive bounds of our theorems to obtain ‘Bound 2’ in 
the table. 

Example 1. Let Ci be an asymptotic class of (6*o,2^,0i) (3, 3)-SS. Then there 
is an asymptotic class of (6*i, 2^,02) (2,2)-CSS. We have that i?2 = k/9i < 
0.161, and 



= k/ 9 o = R2S1 < O.I6W1, 

which is equivalent to <5i > i?i/0.161. We can use any upper bound R{5) on i?i, 
and get 



Ri < R{Si) < i?(i?i/0.161), 

and i?i < 0.0663 by the linear programming bound. 
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Abstract. We study the problem of encoding homotopy of simple paths 
in the plane. We show that the homotopy of a simple path with k edges 
in the presence of n obstacles can be encoded using 0(n log(n + fc)) bits. 
The bound is tight if fc = We present an efficient algorithm 

for encoding the homotopy of a path. The algorithm can be applied to 
find homotopic paths among a set of simple paths. We show that the 
homotopy of a general (not necessary simple) path can be encoded using 
O(fclogn) bits. The bound is tight. The code is based on a homotopic 
minimum-link path and we present output-sensitive algorithms for com- 
puting a path and the code. 



1 Introduction 

A fundamental problem is to find shortest paths in a geometric domain [12]. 
Chazelle [4] and Lee and Preparata [10] gave a funnel algorithm that computes 
the shortest path between two points in a simple polygon. Hershberger and 
Snoeyink [8] simplified the funnel algorithm and studied various optimizations 
of a given path among obstacles under the Euclidean and link metrics and under 
polygonal convex distance functions. 

The topological concept of homotopy captures the notion of deforming paths. 
A path is a continuous map tt : [0, 1] — >■ Let a, /3 : [0, 1] — >■ be two paths 
that share starting and ending endpoints, a(0) = /3(0) and a(l) = /3(1). Let 
i? C be a set of barriers that includes the endpoints of a and (3. We assume 
that the interiors of the paths a and (3 avoid B, i.e. {t \ a{t) £ B} = {t \ (3{t) G 
B} = {0, 1}. The paths a and j3 are homotopic with respect to the barrier set B 
if a can be continuously transformed into [3 avoiding B. 

Problems related to the homotopy of paths in the plane received attention 
very recently [3,6,2]. In this paper we consider the following questions. How the 
homotopy of a path can be represented in a computer? What is the minimum 
number of bits needed to encode the homotopy of a path with k edges in the 
presence of n obstacles? 

Homotopy Encoding. Let H be a set of n barrier points in the plane 
and s and t be two barrier points. Let 7T be a class of st-paths with at 
most k edges such that each path intersects B by the endpoints only. 
Find an integer number N and a map r : 77 — >■ [0..7V] such that two 
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paths 7Ti,7r2 G n are homotopic iff their codes are equal, t(7Ti) = r(7T2). 

Minimize the number of bits log N as a, function of n and k. 

To the best of our knowledge this paper is the first to study the problem of 
homotopy encoding HEP. Homotopy encoding can be used to solve a problem 
of testing homotopy of two paths [3]. One can test homotopy of two paths by, 
first, encoding their homotopies and then comparing the codes. Cabello et al. [3] 
established a criteria for two simple paths to be homotopic. They introduced a 
canonical sequence for a path and proved that two simple paths are homotopic if 
their canonical sequences are equal. Unfortunately, the canonical sequences can 
have fi{nk) length which makes them ineffective for testing homotopy. Cabello 
et al. [3] found a way avoiding the computation of canonical sequences to test 
homotopy in 0(m log m) time where m = k + n. 

We focus on two classes of paths: simple paths and general (not necessarily 
simple) paths. For the simple paths we show a lower bound of I7(n log k) for the 
number of bits in a homotopy code if A: = We introduce a spanning 

homotopic graph and show that it can be used to recognize the path homotopy. 
It can be used to encode the homotopy of a simple path using n(log n + 3 log k + 
3 log 12) + o(n) bits. The bound is tight if A: = Q{n}^^). 

The main difficulty in homotopy encoding lies in computation of the spanning 
homotopic graph. Our approach is based on recently developed techniques for 
computing shortest homotopic paths [3,6,2]. We are interested in a special case 
of the problem (single path) : given a path with k edges and a set of n barriers, 
find the shortest homotopic path. Very recently Efrat et al. [6] presented output 
sensitive algorithm for computing the shortest homotopic paths. The algorithm 
runs in 0(r?!'^ + A;logn + K) time where K is the size of the output path. 
Note that K can be n{kn) in the worst case. They also gave a randomized 
algorithm with 0(nlog^~''^n + A:logn + iF) running time. In [2] the deterministic 
algorithm was improved and running time 0(nlog^~*’^ n+A:log n+K) is achieved. 
We show that the path homotopy can be encoded in O(mlogn) time where 
m = max(n, A:). We show that the shortest homotopic path can retrieved (or 
decoded) from the homotopy code in 0{n + K) time. 

For non-simple paths, we show a lower bound of 0{k log n) for any homotopy 
encoding. We provide a homotopy code that achieves this bound. We introduce 
canonical minimum-link path homotopic to a path and show that it can be 
computed in -\- k\o^ n) time. The path can be used for the homotopy 

code. We also show that using space-time tradeoff the running time can be 
improved in the case k < to 0((n-|- A:-|- (nA:)^/^)polylog(n)). This improves 
an algorithm by Hershberger and Snoeyink [8] that computes a minimum-link 
path homotopic to a path in 0{nk) time. 

Our algorithms for homotopy encoding can be used for testing homotopy 
among multiple paths. This can be applied both for simple paths and general 
paths. To the best of our knowledge the problem with multiple paths was not 
considered. It can be stated as homotopy classification: Given simple/non-simple 
paths n = {tti, . . . , 7Ti} in the plane avoiding a set of n barriers, partition U 
into classes of homotopy equivalent paths. For simple paths, the problem can be 
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solved in 0(M log n) where M = ma,x{n,K) and K is the total complexity of 
the paths in U. For non-simple paths the problem can be solved in 0((n -I- A' -|- 
(nA)^/^)polylog(n)) time. The solution is based on computing homotopy code 
for each path and sorting the codes. 



P8 



P7 



Pa 



Pb Pi 



P3 



P2 



Pi 



Fig. 1. Lower bound, a = (1, 0, 1, 2, 0, 3, 1, 2). The path winds around a point pi ai -|- 
. . . + ai times. 



2 Lower Bound for Simple Paths 

In this Section we show a lower bound for a homotopy code of simple paths. 

Theorem 1. Suppose that k > Then any homotopy eode for simple paths 

has at least ^2{nlogk) bits. 

Proof. Let pi , p 2 , ■ ■ • , P™ be barrier points located on a horizontal line in decreas- 
ing order of a:-coordinates, see Fig. 1. We put the points s and t on the same 
line so that all the barrier points are on the right side. Let m = [A:/4J and let a 
be a vector from Z” such that all Oi > 0 and J2i<i<n construct a 

simple path such that, for each i, the path winds around the point pi J2i<j<i 
times, see Fig. 1. This property holds for the shortest homotopic path as well. 
Therefore distinct vectors a and a' correspond to non-homotopic paths. Thus 
the total number of pairwise non-homotopic paths of length k is at least the 
number t{n, k) of distinct vectors a. 

Suppose that k => . A simple combinatorial observation is that t(n, k) = 

(™n") “ + n)!/(n!m!). Using Stirling’s formula one can obtain logt{n,k) = 

f2{nlog k). 

3 Homotopy Code for Simple Paths 

In this Section we introduce a fairly simple schema to encode the homotopy 
of simple paths that achieves optimal size if k is larger than The idea is 
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based on the shortest homotopic path. We define a spanning homotopic graph G 
as follows. Let tt be a simple path in the presence of obstacles B. Let tt' be its 
shortest homotopic path. The graph G has B U {s,t} as the set of vertices and 
two vertices a and b are adjacent if the segment ab is in tt'. Abusing notation we 
treat the vertices of G as points in as well. An edge can be traversed many 
times if one walks along tt'. We assign a weight w{e) to an edge e = (a, b) to be 
the number of times e is traversed. Let p be a vertex of G. Let w{p) be a weight 
of p defined as the sum of weights of edges in G incident to p. 

A vertex w of a planar graph embedded in the plane is called pointed if there 
is a line I passing through v such that all the edges incident to v are located in 
one of the halfplanes defined by 1. We call a planar graph pointed if all its vertices 
are pointed. The spanning homotopic graphs possess the following properties. 

Lemma 1. Let G be a spanning homotopic graph. Then G is pointed and its 
vertices satisfy a parity property: vertiees from B have even weights and the 
vertices s and t have odd weights. 

The properties in Lemma 1 are not sufficient for a weighted planar graph 
to be a spanning homotopic graph, see Fig. 2. We extend the definition of the 
spanning homotopic graph to a set of disjoint and simple paths, open or closed 
(note that, unlike open paths, a closed path does not contribute to the set of 
graph vertices) . Let 77 be a collection of disjoint simple paths tti , 7T2 , . . . , , each 

path is either open or closed. We assume that the endpoints of an open path 
TTi are barrier points (note that any two open paths have disjoint enpoints) . For 
each path 7Tj G 77 we find its shortest homotopic path in the presence of obstacles 
B. A vertex of the spanning homotopic graph G of 77 is either a barrier point 
or an endpoint of an open path tt^. A pair (p,q) is an edge of G if the shortest 
path of a path has the line segment pq. The weight of an edge (p, q) is defined 
as the number of times the segment pq is traversed by the sortest paths. 




Fig. 2. (a) Graph G is pointed and satisfies the parity property but (b) has no under- 
lying simple path. 



Theorem 2. Let LI be a set of disjoint simple paths 7ri,7r2, . . . ,71^. Let LI' be 
a set of disjoint simple paths 7r(,7r2, . . . If the spanning homotopic graphs 

of LI and 77' are equal then m' = m and there is a permutation ai,a 2 , ■ ■ ■ , Om 
such that, for every i, the paths and are either 

(i) both closed paths that are homotopic in presence of obstacles B, or 
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(ii) both open paths with the same endpoints from B and are homotopic in pres- 
ence of obstacles B. 

We apply Theorem 2 to a single path. 

Corollary 1. Two st-paths induce the same spanning homotopic graph iff they 
are homotopic. 

Let V\,V 2 , ■ ■ - he the list of vertices from V. We define homotopy code B) 
of a simple path ir € II as the sequence of triples {vi, Vj,w{vi, Vj)) in the lexico- 
graphical order. By Corollary 1 two paths are homotopic iff their homotopy codes 
X are equal. 

We mention for completeness that, for every pointed weighted graph G, there 
is a set of paths whose spanning homotopic graph is G. 



3.1 Succinct Homotopy Code 

The number of triples {vi, Vj,w{vi, Vj)) in a homotopy code is 0(ji). The explicit 
representation of a homotopy code requires 0(nlog(n-|-fc)) bits since the indices i 
and j need O(logn) bits (this is the widely used adjacency-list encoding) and the 
weight w{vi,Vj) need O(logfc) bits. For small values of k this representation can 
exceed 0(n log fc) bound. We apply succinct encoding of labeled planar graphs 
[9,13] where the vertices of a planar graph are embedded into the plane and 
labeled. Keeler and Westbrook [9] proved that a labeled planar graph with n 
vertices and m edges can be encoded using nlogn -I- 3nlog 12 -|- o(n) bits. 

We encode the spanning homotopic graph without weights and the weights 
separately. We label the spanning homotopic graph according to the lexico- 
graphical order of the vertex coordinates. The weight components of y can be 
encoded in mlogfc < 3nlogA; bits. This imples the following theorem. 

Theorem 3. The homotopy of a simple path can be encoded using n(log n -\- 

3 log fc -I- 3 log 12) -I- o(n) bits. 

4 Shortest Homotopic Paths 

In this Section we briefly describe the construction of canonical paths [3], the 
bundling [6] and an algorithm [2,3,6]. As in [6] we use the canonical paths to 
shortcut the given path and divide it into x-monotone paths. We can treat the 
monotone paths as horizontal segments and obtain rectified paths [3] . To rectify 
the paths one needs “aboveness” relation between the monotone paths and the 
barriers. This can be computed using a triangulation or trapezoidization [6] by 
an algorithm of Bar- Yehuda and Chazelle [1]. The running time is 0(nlog^~''^ n-\- 
kin) for any fixed £ > 0. The rectified paths can be shortcut by vertical segments. 
Applying the segment dragging queries by Chazelle [5] shortcuts can be done in 
0(fci„ log n) time using O(nlogn) preprocessing and 0{n) space. 
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The number homotopically different paths produced by shortcutting is at 
most 2n [6]. The homotopic paths can be bundled reducing the problem to the 
case kin < 2n. 

The problem is further reduced to finding a shortest monotone path in a sim- 
ple polygon with barriers colored in two colors [2] . A hierarchical data structure 
is constructed to compute the paths efficiently. It stores the paths in a compact 
way by partitioning them into so called A-paths. We modify the data structure 
to store weighted X-paths where the weight is a number of times the A-path is 
traversed. The weights of the paths can be used to compute the weights of edges 
of the spanning homotopic graph. We leave details for final version. 

Theorem 4. The homotopy code x(7t, B) of a simple path tt can he computed 
in 0{{n + k) log n) time using 0{n + k) space. 

We show how the shortest homotopic path can be extracted from the homo- 
topy code X. We assume that the weights of edges take 0(1) space per weight. 
This is reasonable assumption since weights are bounded by k. The running time 
is 0(n -I- K). 

5 Non-simple Paths 

5.1 Lower Bound 

We show a lower bound for general paths and design a homotopy code achieving 
this bound. 

Lemma 2. Any homotopy code for general paths has at least O(fclogn) bits. 

Proof. To show the lower bound we provide different homotopy paths from s to 
t. Let a = (oi, 02 , ■ • ■ , oim),m = \k/‘i \ be a sequence of integers from 1 < < n 

such that there are no repetitions yf cti+i and Pa^ yf s,Pa^ yf t. We generate 
a path for each sequence a. The idea is that, for three consecutive numbers 
ai-iaiai+i, we can make a path Pai_iPai+i using two or three links such that 
Pa^ supports the path, see Fig. 3. The path has at most 3m < k edges (each 
point pcii contributes at most three edges to the path). 
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All the generated paths have different homotopy since the shortest homotopic 
path for a is spa^^ ■ ■ -Pami- The number of paths is s(n, m) = n{n — The 

lower bound follows since log s(n,m) = f2(mlogn) = f2{klogn). 



5.2 Supported Paths 

To show the upper bound we need some notations and properties and then intro- 
duce a canonical homotopic path for a general path tt. The key idea is based on a 
minimum-link path that is defined as a homotopic path with minimum number of 
edges. A minimum-link path is not unique but it preserves the path complexity. 
Hershberger and Snoeyink [8] designed an algorithm for finding minimum-link 
path in a simple polygon with holes. The polygon is triangulated. They proved 
that the minimum-link path, tt', homotopic to a given path tt can be found in 
time 0{Ctt -|- A,r') where is the path complexity, A^r is the number of 

times that a crosses a triangulation edge. In terms of n and k, this bound is 
0{nk). The algorithm exploits useful properties of minimum-link paths. 




Fig. 4. Support points. Arrows show the sides of support, (a) Point pi supports the 
edge qiqi+i from below, (b) points pi and p 2 support the inflection edge qtqt+i from 
below and above, and (c) point pi supports the start edge qoqi- 



The approach by Ghosh [7] and Hershberger and Snoeyink [8] is based on 
computing the shortest homotopic path. This approach in our setting leads to 
Q{nk) algorithm since the complexity of the shortest homotopic path can be 
Q{nk). In order to make a faster algorithm we apply another approach using 
supported paths. Let it = qo = s, qi, qk = t he a, path from s to t. Traversing 
7T from s to t, we can label each vertex as a left or right turn. An edge qiqi+i, 1 < 
i < k whose endpoints make left turns is supported if it touches a barrier point 
on its left side, see Fig. 4 (a). The barrier point is called support point of the 
edge. The support point can be one of the endpoints of the edge. Similarly we 
define supported edge whose endpoints make right turns. As in [8] we call such 
an edge 1 < i < k inflection edge if its endpoints make different turns. 

We require two support points for the inflection edge qiPi+i to be supported and 
the barrier points should be on different sides of the edge and their order should 
correspond the turns at qi and qi+i, see Fig 4 (b). We call the edge incident to 
the support points of an inflection edge support edge. We also define conditions 
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for the first and last edge to be supported. The edge qoqi is supported if qi 
makes left /right turn and qoqi has a barrier on its left /right side, see Fig. 4 
(c). Similarly the support of the last edge is defined. A path is supported if all 
its edges are supported. An extension of a line segment ab is any line segment 
containing ab. Note that a supported inflection edge is an extension of its support 
edge. 



5.3 Normalization 

We introduce normalization tools to update a path. We show that a path tt that 
is not supported can be normalized. If an edge qiqi+i whose endpoints make left 
turns is not supported, then we slide qiqi+i until either it hits a barrier point 
or it reaches one of the endpoints gj_i or <7^+2, see Fig. 5 (a) and (b). The start 
edge can be normalized by sliding qi toward q2, see Fig. 5 (c). Normalization 




Fig. 5. Normalization, (a) Path qi-iqiqi+iqi+2 is changed to qi-iq'iq'i^iqi+2, (b) path 
is changed to qi-iqiqi+ 2 , and (c) path qoqiq 2 is changed to qoq[q 2 . 

of an edge with different turns at endpoints is more complicated. Let qiqt+i be 
such an edge. We can assume that both the triangle qi-iqiqi+i and the triangle 
QiQi+iQi+2 contain barrier points, otherwise we shortcut the path by trading two 
edges for one. If the union of two triangles is a convex quadrangle, then we 
change the edge qiqi+i by common tangent of two barrier sets in the triangles 
qi-iqiPi^i and qiqi+iqi+2, see Fig. 6 (a) (note that if one or both barrier sets are 
missing we can reduce the number of edges by one or two). If the quadrangle 
qi-iqiqi+iqi+2 is not convex then we truncate one of the triangles and use the 
above argument, for example, we consider the triangles qi-iqiqi+i and 
in the case depicted in Fig. 6 (b). 

A path is normalized if none of normalization tools can be applied to it. 

Lemma 3. Let tt' he a normalized path obtained from a path tt. Then tt' is 
supported and has no more edges than tt. 

5.4 Inflection Edges 

Let 5(a) denote the set of support edges of inflection edges of a path a. 

Theorem 5. Let tt' and tt" he two supported paths homotopic to tt. They have 
the same set of support edges of inflection edges S(tt') = S(tt") and the edges of 
S{tt') occur in the same order in the paths tt' and tt" . 
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Fig. 6. Normalization, (a) Common tangent and the path (b) changing 

triangle qiqi+iqi +2 by qiqi+iq'i+i- 



no barrier points 




We show that the paths between inflection edges of tt' can be further nor- 
malized producing a unique path. 

5.5 Canonical Minimum-Link Path and Homotopy Code 

Let qiQi+i and qjqj+i,i < j be two inflection edges of a supported path. Let 
a = qi+iqi +2 ■ ■ ■ qj-iqj be the path between two edges. We also assume that 
there are no inflection edges in a. The vertices qi+i, ■ ■ ■ ,qj make the same turn 
and we assume that it is right turn, see Fig. 7. We And a point a on the ray 
qiqi+i and a point & on a such that the paths qt+iqi +2 ■ ■ - b and qi+iab are 
homotopic. In other words there are no barrier points between these paths, see 
Fig. 7. We optimize ab so that the path qi+ib has maximum length. We can 
view this as a travel of b along a starting from qi+ 2 - The travel of the point 
b is can be stopped if two barrier points on ab are found as in Fig. 7. The 
travel of b is restricted by a point on a such that a line parallel to qiqi+i is 
tangent to b (this may happen if a winds several times). We substitute the 
path qi+iqi +2 ... 6 of a by qi+\ab without increasing the number of edges. If the 
path ab can extended to reach the ray qj+iqj preserving homotopy then we use 
the extension. Otherwise we continue this process on the path bqj. This gives 
canonical minimum-link path homotopic to tt. The construction can be viewed 
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as the greedy path with maximum extensions. This is similar to the algorithms 
by Ghosh [7] and Hershberger and Snoeyink [8]. 

Theorem 6. Canonical minimum-link path homotopic to a path tt can he com- 
puted in + klog^ n) time. 

A standard tradeoff technique can be used in the case k < see for ex- 
ample [11,2]. The running time can be reduced to 0((n-|-A:-|-(nA:)^/^)polylog(n)). 

We generate a homotopy code using the canonical minimum-link path tt' 
homotopic to tt as follows. Each edge of tt' is supported by two vertices. We 
store the vertices and the corresponding sides of the edge in a list. Each item 
takes O(logn) bits using indices of barriers (the sides takes two bits). The total 
size is O(fclogn). We conclude the following theorem. 

Theorem 7. There is a homotopy code for paths in the plane using O(fclogn) 
bits. The homotopy code of a path can he computed in 0{{n -\- k -\- 
(n/c)^/^)polylog(n)) time. 
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Abstract. We consider the problem of coding labeled trees by means 
of strings of node labels and we present a unified approach based on a 
reduction of both coding and decoding to integer (radix) sorting. Apply- 
ing this approach to four well-known codes introduced by Priifer [18], 
Neville [17], and Deo and Micikevicius [5], we close some open problems. 
With respect to coding, our general sequential algorithm requires opti- 
mal linear time, thus solving the problem of optimally computing the 
second code presented by Neville. The algorithm can be parallelized on 
the EREW PRAM model, so as to work in O(logn) time using 0(n) or 
0{n^ylog n) operations, depending on the code. 

With respect to decoding, the problem of finding an optimal sequential 
algorithm for the second Neville code was also open, and our general 
scheme solves it. Furthermore, in a parallel setting our scheme yields the 
hrst efficient decoding algorithms for the codes in [5] and [17]. 



1 Introduction 

Labeled trees are of interest in practical and theoretical areas of computer sci- 
ence. For example, Ethernet has a unique path between terminal devices, thus 
being a tree: labeling the tree nodes is necessary to uniquely identify each device 
in the network. An interesting alternative to the usual representations of tree 
data structures in computer memories is based on coding labeled trees by means 
of strings of node labels. This representation was first used in the proof of Cay- 
ley’s theorem [2,18] to show a one-to-one correspondence between free labeled 
trees on n nodes and strings of length n — 2. In addition to this purely mathe- 
matical use, string-based codings of trees have many practical applications. For 
instance, they make it possible to generate random uniformly distributed trees 
and random connected graphs [14]: the generation of a random string followed 
by the use of a fast decoding algorithm is typically more efficient than random 
tree generation by the addition of edges, since in the latter case one must pay 
attention not to introduce cycles. In addition, tree codes are employed in genetic 
algorithms, where chromosomes in the population are represented as strings of 
integers, and in heuristics for computing minimum spanning trees with addi- 
tional constraints, e.g., on the number of leaves or on the diameter of the tree 
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itself [7,8,20]. Not last, tree codes are used for data compression [19] and for 
computing the tree and forest volumes of graphs [13]. 

Tree codes. We now survey the main tree codes known in the literature, refer- 
ring the interested reader to the taxonomy in [6] for further details. We assume 
to deal with a labeled u-node rooted tree T whose nodes have distinct labels 
from [l,n]. All the codes that we discuss are obtained by progressively updating 
the tree through the deletion of leaves: when a leaf is eliminated, the label of its 
parent is added to the code. 

The oldest and most famous code is due to Priifer [18] and always deletes 
the leaf with smallest label. In 1953, Neville [17] presented three different codes, 
the first of which coincides with Priifer’s one. The second Neville’s code, before 
updating T, eliminates all the leaves ordered by increasing labels. The third 
Neville’s code works by deleting chains. We call pending chain a path ui, . . . ,Uk 
of the tree such that the starting point Ui is a leaf, and, for each i G [1, fc — 1], the 
elimination of Ui makes rti+i a leaf: the code works by iteratively eliminating 
the pending chain with the smallest starting point. Quite recently, Deo and 
Micikevicius [5] suggested the following coding approach: at the first iteration, 
eliminate all tree leaves as in the second Neville’s code, then delete the remaining 
nodes in the order in which they assume degree 1. For brevity, we will denote 
the codes introduced above with PR, N2, N3, and DM, respectively. 

All these codes have length n—1 and the last element is the root of the tree. 
If the tree is unrooted, the elimination scheme implicitly defines a root for it. 
Actually, it is easy to see that the last element in the code is: a) the maximum 
node label (i.e., n) for Priifer’s code; b) the maximum label of a center of the 
tree for the second Neville’s code; c) the label of the maximum leaf for the third 
Neville’s code; d) the label of any tree center for Deo and Micikevicius ’s code. In 
cases a), c), and d), the value of the last element can be uni vocally determined 
from the code and thus the code length can be reduced to n — 2. We remark 
that codes PR and DM have been originally presented for free trees, while Neville’s 
codes have been generalized for free trees by Moon [16]. 

Related work. A linear time algorithm for computing Priifer codes is presented 
in [3]. The algorithm can be easily adapted to the third Neville’s code. Deo and 
Micikevicius give a linear time algorithm for code DM based on a quite different 
approach. As stated in [6], no 0(n) time algorithm for the second Neville’s code 
was known so far: sorting the leaves before each tree update yields indeed an 
0(n log n) bound. An optimal parallel algorithm for computing Priifer codes, 
which improves over a previous result due to Greenlaw and Petreschi [10], is 
given in [9]. A few simple changes make the algorithm work also for code N3. 
Efficient, but not optimal, parallel algorithms for codes N2 and DM are presented 
in [7] . A simple - non optimal - scheme for constructing a tree T from a Priifer 
code is presented in [6]. The scheme can be promptly generalized for building T 
starting from any of the other codes: this implies decoding in linear time codes N3 
and DM, and in 0(n log n) time codes PR and N2. An 0(n) time decoding algorithm 
for PR is described in [9]. In a parallel setting, Wang, Chen, and Liu [19] propose 
an O(logn) time decoding algorithm for Priifer codes using 0(n) processors on 
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the EREW PRAM computational model. At the best of our knowledge, parallel 
decoding algorithms for the other codes were not known in the literature until 
this paper. 

Our contribution. We show that both coding and decoding can be reduced to 
integer (radix) sorting. Based on this reduction, we present a unified approach 
that works for all the codes introduced so far and can be applied both in a 
sequential and in a parallel setting. The coding scheme is based on the definition 
of pairs associated to the nodes of T according to criteria dependent on the 
specific code: the coding problem is then reduced to the problem of sorting these 
pairs in lexicographic order. The decoding scheme is based on the computation 
of the rightmost occurrence of each label in the code: this is also reduced to 
integer radix sorting. 

Concerning coding, our general sequential algorithm requires optimal linear 
time for all the presented codes; in particular, it solves the problem of computing 
the second Neville code in time 0(n), which was still open [6]. The algorithm 
can be parallelized, and its parallel version either matches or improves by a 
factor 0(y/logn) the performances of the best ad-hoc approaches known so far. 
Concerning decoding, we design the first parallel algorithm for codes N2, N3, 
and DM, working on the EREW PRAM model in O(logn) time and 0(n^logn) 
operations (with respect to PR, our algorithm matches the performances of the 
best previous result). Our parallel results both for coding and for decoding are 
summarized in the following table: 





Coding 


Decoding 




before 


this paper 


before 


this paper 


PR 


0(n) [9] 


0(n) 


O(nlogn) [19] 


O(nlogn) 


N2 


O(nlogn) [7] 


0{n^ylog n) 


open 


0{n^Jklgn) 


N3 


0(n) [7,9] 


0{n) 


open 


0{n^\og n) 


DM 


O(nlogn) [7] 


0{n^log n) 


open 


0{n^ylog n) 



where costs are expressed in terms of number of operations. We remark that 
the problem of finding an optimal sequential decoding algorithm for code N2 
was also open, and our general scheme solves it optimally. Hence, we show that 
labeled trees can be coded and decoded in linear sequential time independently 
of the specific code. Due to lack of space, we omit many details in this extended 
abstract. 

2 A Unified Coding Algorithm 

Many sequential and parallel coding algorithms have been presented in the lit- 
erature [3,5,6,9,10,19], but all of them strongly depend on the properties of the 
code which has to be computed and thus are very different from each other. In 
this section we show a unified approach that works for all the codes introduced 
in Section 1 and can be used both in a sequential and in a parallel setting. 
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Table 1. Pair associated to node v for different codes. 





PR 


N2 


N3 


DM 


Xv 


fi{v) 


l{v) 


A(u) 


l(v) 


Vv 


d{fi{v),v) 


V 


d{X{v),v) 


j(v) 



Namely, we associate each tree node with a pair of integer numbers and we sort 
nodes using such pairs as keys. The obtained ordering corresponds to the order 
in which nodes are removed from the tree and can be thus used to compute the 
code. In the rest of this section we show how different pair choices yield Priifer, 
Neville, and Deo and Micikevicius codes, respectively. We then present a linear 
time sequential coding algorithm and its parallelization on the EREW PRAM 
model. The parallel algorithm works in O(logn) time and requires either 0(n) 
or 0(nx/log n) operations, depending on the code. 

Coding by sorting pairs. Let T be a rooted labeled n-node tree. If T is not 
rooted, we choose a root r as in points a) ~ d) in Section I. Let u, v be any two 
nodes of tree T. Let us call: Ty, the subtree of T rooted at v, d{u, v), the distance 
between any two nodes u and v (d(v, v) = 0); l{v), the level of a node v, i.e., the 
maximum distance of v from a leaf in T„; n(v), the maximum label among nodes 
in Ty] X{v), the maximum label among leaves in T„; 'y{v), the maximum label 
among the leaves in Ty at maximum distance from v] (xy,yy), a pair associated 
to node v according to the specific code as shown in Table 1; P, the set of pairs 
(xy,yy). The following lemma establishes a correspondence between the set P 
of pairs and the order in which nodes are removed from the tree. Due to lack of 
space, we defer the proof to the extended version of this paper. 

Lemma 1. For each code, the lexicographic ordering of the pairs (xy,yy) in set 
P corresponds to the order in which nodes are removed from tree T according to 
the code definition. 

Before describing the sequential and parallel algorithms, note that it is easy 
to sort the pairs {xy,yy) used in the coding scheme. Indeed, independently of 
the code, each element in such pairs is in the range [l,n]. A radix-sort like 
approach [4] is thus sufficient to sort them according to first, and Xy, later. 
In Figure I the pairs relative to the four codes are presented. The tree used in 
the example is the same in the four cases and is rooted according to points a) - 
d) in Section 1. Bold arcs in the trees related to codes PR and N3 indicate chains ^ 
and pending chains, respectively; dashed lines in the trees related to codes N2 
and DM separate nodes at different levels. In each figure the string representing 
the generated code is also shown. 

Sequential algorithm. Using the pairs defined in Table 1, an optimal sequential 
coding algorithm is now straightforward: 

^ According to the definition of Priifer’s code, when the node g{v) is chosen for re- 
moval, the only remaining subtree of consists of a chain from g{v) to v. 
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Fig. 1. Pair associated to each tree node as specified in Table 1. 



UNIFIED CODING ALGORITHM: 

1. for each node v, compute the pair (a;„,j/„) 

2. sort the tree nodes according to pairs (x„,j/„) 

3. for i = 1 to n — 1 do 

4. let V be the i-th node in the ordering 

5. append parent{v) to the code 

The UNIFIED CODING ALGORITHM clearly requires linear time: the set of pairs 
can be easily computed in 0{n) time using a post-order visit of the tree, and 
two bucket-sorts can be used to implement step 2. Hence we have the following 
theorem: 

Theorem 1. Let T he a n-node tree and let the pair {xy,yy) assoeiated to each 
node V ofT be defined as in Table 1. The unified coding algorithm computes 
codes PR, N2, N3, and DM in 0{n) worst-case running time. 



Parallel algorithm. We now show how to parallelize each step of the sequential 
algorithm presented above. We work in the simplest PRAM model with exclusive 
read and write operations (EREW [12]). The Euler tour technique makes it 
possible to root the tree at the node r specified in Section 1 in O(logn) time 
with cost 0{n) [12]. The node r can be easily identified; in particular, when r is 
the center of the tree, we refer to the approach described in [15]. The pairs given 
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in Table 1 can be computed in O(logn) time with cost 0(n) using standard 
techniques, such as Euler tour, rake, and list ranking [12]. 

Step 3 can be trivially implemented in 0(1) time with cost 0(n). The sort- 
ing in step 2 is thus the most expensive operation. We follow a radix-sort like 
approach and use the stable integer-sorting algorithm presented in [11] as a sub- 
routine. This requires O(logn) time, linear space and 0(n-\/log n) cost on an 
EREW PRAM with O(logn) word length. Under the hypothesis that the ma- 
chine word length is 0(log^ n), the cost of sorting can be reduced to 0(n) [11], 
and so does the cost of our coding algorithm. 

We remark that our algorithm solves within a unified framework the problem 
of computing four different tree codes. In addition, with respect to codes N2 and 
DM, it improves of an 0(-\/log n) factor over the best approaches known in the 
literature [7]. Unfortunately, as far as we have described it, it does not match 
the perfomances of the optimal algorithms available for codes PR [9] and N3 [7]. 
However, in these cases we can further reduce the cost of our algorithm to 0{n) 
by using an ad-hoc sorting procedure that benefits from the partition into chains. 

Let us consider Priifer codes first. As observed in [10], the final node ordering 
can be obtained by sorting chains among each other and nodes within each chain. 
In our framework, the chain ordering is given by the value and the position 
of each node within its chain by the distance d{fj,{v), v). Instead of using a black- 
box integer sorting procedure, we exploit the fact that we can compute optimally 
the size of each chain, i.e., the number of nodes with the same 4 t(u). A prefix sum 
computation gives, for each chain head, the number of nodes in the preceding 
chains, i.e., its final position. At last, the position of the remaining nodes is 
univocally determined summing up the position of the chain head ii{v) with 
the value d{^{v),v). Similar considerations can be applied to the third Neville’s 
code. The following theorem summarizes our results on parallel coding: 

Theorem 2. Let T he a n-node tree and let the pair {xy,yv) associated to each 
node V ofT be defined as in Table 1. On the EREW PRAM model, the unified 
CODING ALGORITHM computes codes PR and N3 optimally, i.e., in O(logn) time 
with cost 0{n), and codes N2 and DM in O(logn) time with cost 0{ny/log n) . 

3 Decoding Algorithms 

In this section we present sequential and parallel algorithms for decoding, i.e., 
for building the tree T corresponding to a given code C. As far as C is computed, 
each node label in it represents the parent of a leaf eliminated from T. Hence, in 
order to reconstruct T, it is sufficient to compute the ordered sequence of labels 
of the eliminated leaves, say S\ for each i G [1, n — 1], the pair (Q, Si) will thus 
be an arc in the tree. Before describing the algorithms, we argue that computing 
the rightmost occurrence of a node in the code is very useful for decoding, and 
we show how to obtain such an information both in a sequential and in a parallel 
setting. 

Decoding by rightmost occurrence computation. We first observe that 
the leaves of T are exactly those nodes that do not appear in the code, as they 
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are not parents of any node. Each internal node, say v, in general may appear in 
C more than once; each appearance corresponds to the elimination of one of its 
children, and therefore to decreasing the degree of by 1. After the rightmost 
occurrence in the code, v is clearly a leaf and thus becomes a candidate for being 
eliminated. More formally: 

Vu yf r, 3 unique j > rightmost{v, C) such that Sj = v 

where r is the tree root (i.e., the last element in C) and rightmost{v , C) de- 
notes the index of the rightmost occurrence of node v in C . We assume that 
rightmost{v , C) = 0 if f is a leaf of T. 

It is easy to compute the rightmost occurrence of each node sequentially by 
simply scanning code C . In parallel, we can reduce the rightmost occurrence 
computation problem to a pair sorting problem. Namely, we sort in increasing 
order the pairs (Cj,i), for t G [l,n — 1]. Let us now consider the sub-sequences 
of pairs with the same first element C*: the second element of the last pair in 
each sub-sequence is the index of the rightmost occurrence of node Cj in the 
code. Since each pair value is an integer in [l,n], we can use twice the stable 
integer-sorting algorithm of [11]: this requires O(logn) time and 0(ni/log n) cost 
in the EREW PRAM model. Then, each processor pi in parallel compares the 
first element of the t-th pair in the sorted sequence to the first element of the 
{i + l)-th pair, deciding if this is the end of a sub-sequence or not. This requires 
additional 0(1) time and linear cost with exclusive read and write operations. 

A unified decoding algorithm. We now describe a decoding algorithm for 
codes N3, PR, and DM that works on the rightmost occurrences and can be used 
both in a sequential and in a parallel setting. First, for each code, we show 
how the position of a node in the sequence S that we want to construct can be 
expressed as a function of rightmost. 

Third Neville code. By definition of code N3, each internal node v is elimi- 
nated as soon as it becomes a leaf. Thus, the position of v in sequence S is 
exactly rightmost(y,C) + 1. The entries of S which are still free after posi- 
tioning all the internal nodes are occupied by the leaves of T in increasing 
order. 

Priifer code. Differently from the third Neville’s code, in code PR an internal 
node V is eliminated as soon as it becomes a leaf if and only if there is no leaf 
with label smaller than v. In order to test this condition, following [19], we in- 
troduce the number of nodes with label smaller than v that become leaves be- 
fore v: prev{v,C) = ]{m : u < v and rightmost{u,C) < rightmost{v,C)}\. 
Thus, the position of v in sequence S is rightmost{v, C) 3- 1 if and only 
if rightmost{v, C) > prev{v,C). All the other nodes are assigned to the 
remaining entries of S by increasing label order. 

Deo and Micikevicius code. All the leaves of T, sorted by increasing labels, 
are at the beginning of sequence S. Then, all the internal nodes appear in 
the order in which they become leaves, i.e., sorted by increasing rightmost. 
It is possible to get a closed formula giving the position of each node. For 
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each i G [l,n — 1], let p{i) be 1 if z is the rightmost occurrence of node Ci, 
and 0 otherwise. Let (r(i) = P(j)- The position of an internal node v is 

exactly |/eaues(T)| + a{rightmost{v,C)). 

Our unified decoding algorithm is as follows: 

DECODING ALGORITHM: 

1. for each node v compute rightmost{v,C) 

2. for each node v except for the root do 

3. if itestiv) = true) then S[position(v)\ <r- v 

4. let L be the list of nodes not yet assigned in increasing order 

5. let P be the set of positions of S which are still empty 

6. for each i = 1 to IP I do 

7 . s[p[i]] ^ m 

where test{v) and position{v) are specified in Table 2. With respect to Priifer’s 
code, the algorithm is essentially the same as the one described in [19], and 
we refer to [19,9] for a detailed parallel analysis. As observed in Section 1, a 
linear sequential decoding algorithm for Priifer codes is presented in [9], while 
the straightforward sequential implementation of our algorithm would require 
0(n log n) time. This can be easily reduced to 0{n) time by adapting the de- 
coding ALGORITHM in such a way that the prev computation can be avoided. 
With respect to codes N3 and DM, the decoding algorithm runs in linear time. 

In parallel, a{i) can be computed for each i using a prefix sum operation [12]. 
In order to get set L in step 4, we can mark each node not yet assigned to S 
and obtain its rank in L by computing prefix sums. Similarly for set P. Hence, 
the most expensive step is the rightmost computation, which requires integer 
sorting. This implies the following result: 

Theorem 3. Let C he a string ofn—1 integers in [l,n]. LetC he the set of eodes 
PR, N3, ond DM. For eaeh i G [l,n— 1], fet test (C [i] ) ond positionCC [i] ) he 
defined as in Table 2. For eaeh eode in C, the decoding algorithm eomputes 
the tree eorresponding to string C in 0{n) sequential time. Decoding on the 
EREW PRAM model requires O(logn) time with cost 0(n log n) for eode PR 
and O(logn) time with cost 0{n^log n) for eodes N3 and DM. 

Second Neville code. Differently from the other codes, in code N2 the right- 
most occurrence of each node in C gives only partial information about sequence 
S. Thus, we treat N2 separately in this section. We first observe that if all nodes 
were assigned with a level, an ordering with respect to pairs {l{v),v) would give 
sequence S, and thus the tree. We refer to Section 2 for details on the correctness 
of this approach. We now show how to compute l{v). 

Let X be the number of leaves of T, which have level 1 and rightmost occur- 
rence 0. Consider the first x elements of code C, say C[l], . . . , C[x\. For each i, 
I < i < X, such that i is the rightmost occurrence of C[i], we know that node 
C\i] has level 2. The same reasoning can be applied to get level-3 nodes from 
level-2 nodes, and so on. With respect to the running time, a sequential scan 
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Table 2. Condition on node v that is checked in the decoding algorithm and 
position of w as a function of rightmost{v, C). 





test{v) 


positioniv) 


N3 


true 


rightmostiv, C) -\- 1 


PR 


rightmost{v, C) > prev{v,C) 


rightmostiv, C) + 1 


DM 


true 


\leaves{T)\ -\- a {rightmostiv , C)) 



of code C is sufficient to compute the level of each node in linear time. Integer 
sorting does the rest. Unfortunately, this approach is inherently sequential and 
thus inefficient in parallel. 

Before describing our parallel approach, we note that the procedure for level 
computation described above can be applied also for code DM. Indeed, let T' 
be the tree obtained interpreting C as the code by Deo and Micikevicius and 
let S' be the corresponding sequence: although T and T' are different, they 
have the same nodes at the same levels and, both in S and S' , nodes at level 
i+\ appear after nodes at level z, but are differently permuted within the level. 
In view of these considerations, we are able to solve our problem in parallel, 
using T' to get our missing level information. Namely, first we build tree T' 
using the decoding algorithm, then we compute node levels applying the 
Euler tour technique, and finally we obtain sequence S (corresponding to tree 
T) by sorting the pairs (l{v),v). It is to remark that the Euler tour technique 
requires a particular data structure [12] that can be built as described in [10]. 
The bottleneck of this procedure is sorting of pairs of integers in [1, n], and thus 
we can use the parallel integer sorting presented in [11]. We can summarize the 
results concerning code N2 as follows: 

Theorem 4. Let C he a string of n — 1 integers in [l,n]. The tree correspond- 
ing to C according to code N2 can he computed in 0{n) sequential time and in 
O(logn) time with cost 0{n^J\og n) on the EREW PRAM model. 

4 Conclusions and Open Problems 

We have presented a unified approach for coding labeled trees by means of 
strings of node labels and have applied it to four well-known codes: PR [18], 
N2 [17], N3 [17], and DM [5]. The coding scheme is based on the definition of pairs 
associated to the nodes of the tree according to some criteria dependent on the 
specific code. The coding problem is reduced to the problem of sorting these 
pairs in lexicographic order. The decoding scheme is based on the computation 
of the rightmost occurrence of each label in the code: this is also reduced to radix 
sorting. We have applied these approaches both in a sequential and in a parallel 
setting. We have completely closed the sequential coding and decoding problem, 
showing that both operations in all the four codes can be done in linear time. 
In the parallel setting, further work is still needed in order to improve all the 
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non optimal coding and decoding algorithms. We remark that any improvement 
on the computation of integer sorting would yield better results for our parallel 
algorithms. 
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Abstract. Predicting and optimizing the performance of ray shooting 
is a very important problem in computer graphics due to the severe 
computational demands of ray tracing and other applications, e.g., radio 
propagation simulation. Aronov and Fortune were the hrst to guarantee 
an overall performance within a constant factor of optimal in the fol- 
lowing model of computation: build a triangulation compatible with the 
scene, and shoot rays by locating origin and traversing until hit is found. 
Triangulations are not a very popular model in computer graphics, but 
space decompositions like kd-trees and octrees are used routinely. Aronov 
et al. [1] developed a cost measure for such decompositions, and proved 
it to reliably predict the average cost of ray shooting. 

In this paper, we address the corresponding optimization problem, and 
more generally d-dimensional trees with the cost measure of [1] as the 
optimizing criterion. We give a construction of quadtrees and octrees 
which yields cost 0{M), where M is the inhmum of the cost measure 
on all trees, for points or for (d — l)-simplices. Sometimes, a balance 
condition is important. (Informally, balanced trees ensures that adjacent 
leaves have similar size.) We also show that rebalancing does not affect 
the cost by more than a constant multiplicative factor, for both points 
and (d — l)-simplices. To our knowledge, these are the only results that 
provide performance guarantees within approximation factor of optimal- 
ity for 3-dimensional ray shooting with the octree model of computation. 



1 Introduction 

Given a set S of objects, called a scene, the ray-shooting problem asks, given a 
ray, what is the first object in S intersected by this ray. Solving this problem 
is essential in answering visibility queries. Such queries are used in computer 
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graphics (e.g., ray tracing and radiosity techniques for photo-realistic 3D ren- 
dering), radio- and wave-propagation simulation, and a host of other practical 
problems. 

A popular approach to speed up ray-shooting queries is to construct a space 
decomposition such as a quadtree in 2D and an octree in 3D. The query is then 
answered by traversing the leaves of the tree as they are intersected by the ray, 
and for each cell in turn, testing for an intersection between the ray and the 
subset of objects intersecting that cell. The performance of such an approach 
greatly depends on the quality of that space decomposition. 

Unfortunately, not much is understood about how to measure this quality. 
Practioners use a host of heuristics and parameters of the scene, of which the 
object count is less important than, e.g., the size of the objects in the scene, 
and other properties of the object distribution (density, depth complexity, sur- 
face area of the subdivision). Those parameters are used to develop automatic 
termination criteria for recursively constructing the decompositions. While they 
perform acceptably well most of the time, none of these heuristics performs 
better than the brute-force method in the worst case. More importantly, occa- 
sionally the termination criteria will produce a bad decomposition, and in any 
case there is no way to know the quality of the decomposition because lower 
bounds are hard to come by. 

Our results. In [1], we proposed a measure for bounded-degree space decomposi- 
tions, based on the surface area heuristic, which is a simplification (for practical- 
ity) of a more complicated but theoretically sound cost measure: under certain 
assumptions on the ray distribution, the cost measure provably reflects the cost 
of shooting an average ray using the space decomposition. This has been exper- 
imentally verified [1,2] 

In [6] and in this paper, we are interested in constructing trees with cost 
as low as possible, with a guaranteed approximation ratio. The only objects we 
consider are simplices (points and segments inside the unit square [0, 1]^ in 
or points, segments and triangles inside the unit cube [0, 1]^ in R^). We however 
assume the Real-RAM model so as to avoid a discussion on the bit-length of 
the coordinates. We give and analyze algorithms that produce trees with cost 
0(M), where M is a lower bound on the cost of any tree. The novelty from [6] 
is the extension to d=3 and higher of the results, we also examine the effect 
of rebalancing the tree on the cost measure, and prove that rebalancing only 
increases the cost by a constant multiplicative factor. 

Related work. The work on quadtrees and octrees in the mesh generation and 
graphics community (see the book by Samet [10], the thesis of Moore [8], or 
the survey by Bern and Eppstein [5] for references) is usually concerned with 
a tradeoff between the size of the tree and their accuracy with respect to a 
certain measure (that usually evaluates a maximum interpolation error). It is 
not relevant here. 

There is, however, a rich history of data-structure optimization for ray shoot- 
ing in computer graphics. Cost measures have been proposed for ray shooting in 
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octrees by McDonald and Booth [7], Reinhard and coll. [9], Whang and coll. [12], 
and for other structures, such as bounding volume hierarchies BSP-trees uni- 
form grids and hierarchical uniform grids (see [1] and refs, therein). All of these 
approaches use heuristic criterion (sometimes very effectively) but none offer 
theoretical guarantees. 



2 General Cost Measure Results 

The following cost measure was introduced by Aronov et al. [1] for the purpose 
of predicting the traversal cost of shooting a ray in while using a quadtree 
(for d = 2) or an octree (for d = 3) to store S: 

cs{T)= ^ (7-h ISTlcrj) X Ad-l(cr), (1) 

a(^C(T) 

where C{T) is the set of leaves of the quadtree, S' fl ct is the set of scene objects 
meeting a leaf cr, and Ad-i(cr) is the perimeter length (if d = 2) or surface area 
(if d = 3) of a. 

This cost function provably models the cost of finding all the objects inter- 
sected by a random line, with respect to the rigid-motion invariant distribution 
of lines [1]. Here’s an overly simplified explanation why: when shooting a ray, 
the octree is traversed and all the objects in a traversed leaf are tested against 
the ray to find the first hit. The cost in a leaf a is thus 0(7 -I- [S' fl a\). The 
coefficient 7 depends on the implementation, and models the ratio of the cost of 
the tree traversal (per cell) to that of a ray-object intersection test (per test).^ 
Integral geometry tells us that a random ray will intersect a with probability 
Xd-i{<j) (this is not quite true; read [1] for the niceties). Hence the average cost 
of ray shooting is given by (1) as claimed. 

Tree and object costs. Observe that the cost measure can be split into two 
terms: Ct(T) = "fXd-i(T) = (the tree cost), and Co(T) = 

^d-i{s n T) (the object cost), where s (iT denote the set of leaves of T 
crossed by s and Ad_i is extended to sets of leaves by summation. It is useful to 
keep in mind the following simple observations: when subdividing a cell a, the 
total tree cost of its children is twice the tree cost of cr, and the object cost of 
an object is multiplied by m/2‘^“^ where m G [1 ... 2^^] is the number of children 
intersected by the object. Note that m < 3 for a segment in 2D and m < 7 for 
a triangle in 3D (unless they pass through the center of the cell). As the tree 
grows finer, the tree cost increases while the object cost presumably decreases. 

Lemma 1. For any set S of simplices in [0,1]'^, c(T) > 2^7 -|- 

dV2Xsgs Ad_i(s), for any d>2. 



^ In [2], we show how to choose 7 to reliably get the best cost possible. 
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Proof. The tree cost cannot be less than Ad_i([0, 1]‘^)7 = 2^7, and the object 
cost cannot be less than J2seS improve this lower bound 

further by noting that any leaf a that is intersected by an object s has area 
at least dy/2 times Xd-i{s H a). Indeed, the smallest ratio Ad_i(cr)/Ad-i(s fl a) 
happens when s maximizes Ad_i(s fl a); this happens for a diagonal segment of 
length y/2 for the unit square (of perimeter 4), and for a maximal rectangular 
section of area y/2 for the unit cube (of area 6). In fact, the maximal section 
of the unit d-cube is [4], hence the ratio is at least 2d!^/2 = d^/2 in any 
dimension. I 



3 Tree Construction Schemes 



All we said so far was independent of the particular algorithm used to construct 
the tree. In this section, we introduce several construction schemes and explore 
their basic properties. 



Terminology and notation. We follow the same terminology as [6], and gener- 
alize it to encompass any dimension. For the d-cube [0, 1]"^ and the cells of the 
decomposition, we borrow the usual terminology of polytopes (vertex, facet, h- 
face, etc.). The square is a quadtree that has a single leaf (no subdivision), the 
cube is an octree with a single leaf, and the d-cuhe is a single-leaf tree (for any 
d). We call this tree the unit cell and denote it by If we subdivide 

this leaf recursively until depth fc, we get a complete tree ( of depth k ), denoted 

by ^ j^g leaves form a regular d-dimensional grid with 2^ cells 

on each side. In a quadtree, if only the cells incident to one facet (resp. d facets 
sharing a vertex, or touching any of the 2d facets) of a cell are subdivided, and 
this recursively until depth k, the subtree rooted at that cell is called a k-side 

(resp. k-corner and k-border) tree, and denoted by (resp. ^nd 

T^^border)^, see Figure 1 for an illustration of the 2D case. In higher dimensions, 
there are other cases (one for each dimension between I and d — 2) . All this no- 
tation is extended to starting from a cell a instead of a unit cell, by substituting 
cr for T: for instance, the complete subtree of depth k subdividing a is denoted 
, (complete) 

by< . ; . . . 

The subdivision operation induces a partial ordering ^ on trees, whose min- 
imum is the unit cell. Again, this partial ordering is extended to subtrees of a 
fixed cell cr. 

We consider recursive algorithms for computing a tree of a given set S of 
objects, which subdivide each cell until some given termination criterion is sat- 
isfied. In particular, we may recursively subdivide the unit cell until each leaf 
meets at most one object. We call this the separation criterion, and the resulting 



tree the minimum separating tree, denoted T^®®P^(S'), with variants where the 

recursion stops at depth k, denoted (Note that the depth of T^®®P) is 

always infinite if any two simplices intersect.) In 3D, for non-intersecting trian- 
gles, a variant of [3] stops the recursive subdivision also when no triangle edge 
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Fig. 1. The fc-side quadtree 

si'”'*'' (tight). 



(left), a corner Q 



(corner) 

k 



(center), and a border 



intersects the leaf (but any number of non-intersecting triangle interiors may 
slice the leaf). We will not analyze this variant in this paper. 

Dynamic programming and greedy strategies. As introduced in [6], the dynamic 
programming algorithm finds the tree that minimizes the cost over all trees with 

depth at most k, which we denote by (or cr^°^^^(S') if we start from 

a cell a instead of the unit cell): the algorithm starts with the complete tree 

.^(complete) ^ and simply performs a bottom-up traversal of all the nodes, while 
maintaining the optimum cost of a tree rooted at that node. The decision whether 
to keep the subtree of a cell or prune it is based on the cost of the cell vs. the 
sum of the optimum costs of the subtrees rooted at its 2"^ children. 

Unfortunately, the memory requirements of this algorithm are huge for large 
values of k (although they remain polynomial if A: = 0(logn); see next section). 
Therefore we also propose a greedy strategy with bounded lookahead: the al- 
gorithm proceeds by recursively subdiving the nodes with a greedy termination 
criterion: when examining a cell cr, we run the dynamic programming within cr 

with depth k {k is a, parameter called lookahead). If the best subtree cr[°^^^(S') 
does not improve the cost of the unsubdivided node a, then the recursion termi- 
nates. Otherwise, we replace a by the subtree cr[°^^^(S') and recursively evaluate 

the criterion for the leaves of We call this the fc-greedy strategy, and 

denote the resulting tree by 7 ^(^"S'‘eedy)^^^ ^(fc-greedy)^^^ we start from 
a cell cr instead of the unit cell). Note that unlike all the other quadtrees con- 
structed up to now, that tree could be infinite. We use the notation 
to denote the tree constructed with the fc-greedy lookahead criterion combined 
with a maximum depth of £. 

With no lookahead {k = 1), the greedy strategy simply examines whether one 
subdivision decreases the cost measure. Below, we show that this does not yield 
good trees in general. We will analyze the greedy strategies without and with 
lookahead, first for points, then for simplices. But first, we must grapple with 
the issue of infinite depth. 

Pruning beyond a given depth. The “optimal” tree may not have finite depth 
(it is conceivably possible to decrease the cost by subdividing ad infinitum), so 
we let M denote the infimum of c{‘T) over all trees T for S'. (As a consequence 
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of Lemma 1, M > 2d”f > 0.) In order to have an algorithm that terminates, we 
usually add an extra termination criterion such as a depth limit of k. 

We now show that pruning a tree at a depth of k, for some choice of 
k = 6>(logn) (to ensure that the tree has a polynomial size), increases cost 
at most by a constant factor. We first show it for arbitrary convex obstacles 
(simplices in particular). Then we improve on the result for the case of points. 
The proofs are no difficult and omitted for space consideration. Nevertheless, 
these considerations of depths are necessary to ensure that the computation is 
meaningful. 

Lemma 2. Let T he a d- dimensional tree which stores a set S of n convex 
objects of dimensions not more than d—\. For k = log 2 n + C, let Tk be the tree 
obtained from T by removing every cell of depth greater than k. Then c{Tk) = 
0{c{T)) and the constant does not depend on n nor S. 

Remark. A choice of fc = log 2 n + C ensures that Tk has at most (2^)^^ = 0(n‘^) 
leaves, for any fixed d. Hence the algorithm which computes the full subdivision 
at depth k and then applies the dynamic programming heuristic provably com- 
putes a tree whose cost is 0{M) in polynomial time, as a consequence of Lemma 

2 . 

As a side note, with slightly more restrictive hypotheses on T, the depth k 
can be reduced for points so that Tk has size at most = O(n^) (for 

any d> 2) and cost as close as desired to that of T. 

Lemma 3. Let T he a d-dimensional tree, which stores a set S of n simplices. 
Assume that T does not contain empty internal nodes (i.e. that are subdvided 
but do not contain any object). Let Tk be the tree obtained from T by removing 
every cell of depth greater than k. Then, for every £ > 0 there exists a C (that 
depends only on s and 7 but not on n) such that, for k = log 2 n + C , we 

have 

ciTk) < (1 + £)c(T). 

4 Cost-Optimal Trees 

The following lemma was proven in [6] for the case d = 2. Its statement and 
proof extend straightforwardly to higher dimensions. 

Lemma 4. The lookahead greedy strategy does not always give a cost-optimal 
tree. Specifically, for any k, there is a set S of n objects such that no tree of 
depth at most k has cost less than 2d{'f + n), but some quadtree of depth at least 
k + 1 has cost less than 2 d (7 -|- n). 

Although the lookahead greedy strategy does not produce the optimal solu- 
tion, in the counter-example of the lemma (given for d = 2 in [6]) it does give a 
good approximation. In fact, this can be proven for all scenes. 

Theorem 1. Given a set S of flat objects in the unit cube, and let M be the 
infimum of c{T) over all trees T. There is an integer p (for d = 2 or d = 3, 
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p = 3) such that the tree 7“(?' 8*'®®dy) constructed by the p- greedy strategy has 
cost = o{M). 

Proof. The intuition is that small objects behave well, and the cost of a big 
object is bounded below by a constant times its size so it cannot be reduced by 
very much. Let us look at a cell a of the tree 7“(P“g^®®dy) . going to show 

that, when the optimal decomposition of depth at most p of a cell cr does not 
improve on the cost of a, then the cost of cr is 0{M^) where is the infimum 
cost of all the possible tree subdivisions of a. If this holds true for every leaf cr 
of the p-greedy strategy, then = 0{M) as well. 

Assume there are a objects meeting at most Cd{‘2P) cells, and b other ob- 
jects. The cost of cr is (7 -I- a -I- b)Ai(a). Since (.Qg^ most 

+2^b'^ Ai(cr), which we assumed to be at least c(cr), we have 

c(cr) = (7 -I- a -I- 6) Ai (cr) < ^2^*7 -I- -I- Ai(cr), which implies that 

a < (7 + b){2P — 1) ) ■ AVe will need a technical lemma: 

Lemma 5. For every d and k, there exist constants Cd{k) and Sd{k) > 0 such 
that for any convex object s of dimension at most d — 1, either s intersects at 
most Cd{k) cells of the regular grid of side k, or else Ad_i(s) > Sd{k). We may 
take Cd{k) < d^k’^~^ and C 2 .{k) = 7k — 6. 

Let p be the smallest integer such that Cd{2P) < (2^’)'^”^. By lemma 5, an 
object which belongs to more than Cd{2P) cells has measure at least Sd{2P) so 
its contribution to the cost is at least {d'/2Sd{2P))Xd-i{(j)- The optimal cost M„ 
is then greater than (7-I- bd'y2Sd{2P))Xd-i{<j)- We have then proved that is 
at least a fixed fraction of the cost of cr, and the lemma follows. I 

Already in 2D, the separating quadtree strategy does not work as well for 
segments as for points, especially since it is not able to distinguish between a 
segment that barely intersects the corner of the square and the diagonal (in the 
first case it is usually good to subdivide, and in the second case it is not). The 
lookahead strategy is then a true improvement. 

The case of points. Arguably, the case of points is of theoretical interest only, 
but has relevance since simplices are usually very small in a graphics scene 
(when they come from a subdivision surface), and can be thought of as points. 
This is lent credence by a recent trend: point cloud data (PCD) is becoming an 
important primitive in computer graphics, and several algorithms for rendering 
them have been given of late, which are amenable our cost measure. 

In the plane, the 1- and 2-greedy strategy may produce a quadtree of cost 
0(n) times the optimal cost, and so does 1-greedy in higher dimensions. Never- 
theless, with one more level of lookahead, everything works near-optimally. We 
simply state the lemma and omit the proof, similar the the one given above for 
simplices. 

Lemma 6. The 3-greedy strategy in the plane, and the 2-greedy strategy in d 
dimensions (d>3) produce near-optimal trees for points. Namely, if S be a set 
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of n points in the unit d-cube, and M is the infimum of c{T) over all trees T, 
then = 0{M) for all d > 2, and = 0{M) for all 

d>3. 

As for finding the optimal quadtree, the question is still open whether for 
given values of 7 (and maybe of n) there exists a k such that the lookahead 
strategy yields the optimal result. All we know is that if n = 5 and 7 < 1 tends 
to 1, the required k tends to infinity. We can also mention that if every point 
belongs to at most one cell, then k = 1 leads to the optimal tree. 

5 Rebalancing Quadtrees and Octrees 

Quadtrees are used in meshing for computer graphics, and octrees are used as 
a space subdivision method for ray casting and radiosity methods, for instance. 
Both quadtrees and octrees are used in scientific computing numerical simu- 
lations (e.g., finite element methods). These are but a few applications where 
quadtrees and octrees have appeared. In all these applications, it can be im- 
portant to maintain aspect ratio (hence starting with a unit cell) and to ensure 
that two neighboring cells don’t have wildly differing sizes. This has led several 
authors to propose balancing for trees. Also from our perspective, since the cost 
measure of [ 1 ] provably relates to the cost of traversal only for balanced trees, 
we are interested in balancing trees as well. In this section, we prove that rebal- 
ancing does not affect the cost by more than a multiplicative constant factor. 

Definitions. Two leaves are k-adjacent if they intersect in a convex portion of 
dimension k. A tree is called k-halanced if the depths of any two fc-adjacent leaves 
differ by at most one. Notice that when considering two /c-balanced trees, their 
intersection, constructed from the unit cell by subdividing all and only cells that 
are subdivided in both trees, is fc-balanced. Thus for a tree D, there is a unique 
balanced tree balfc(T) = min{T' ^ T : T' is fc-balanced}, which is called the 
fc-rebalancing of D- 

For instance, 0-balanced quadtrees are what Moore called smooth 
quadtrees [ 8 ], and 1 -balanced what he called balanced and others called 1 - 
irregular or restricted. 

Cost analysis. While the size of balfc(T) is known to increase by at most a 
constant factor from the size of T, the final cost c(balfc(T)) is unknown, however. 
Our main result concerning cost analysis is the following, when objects in S are 
either points or segments. 

Theorem 2. Let T be a tree storing both points and/or simplices in the unit 
cube. Then for any k, 0 < k < d, and d < 3, 

c{balk{T)) < 3^c(T). 



The result becomes c{balk{T)) = 0{d/A'^)c{'T) in higher dimensions. 
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The following lemma was first proven by Weiser in 2D, and by Moore for any 
dimension d > 2. 

Lemma 7 ( [8]). Let T he a tree. There is a 0-balanced tree T' >- T such that 
Ct(&a/o(T)) <3^Ct(T). 

Since 'V must be a refinement of balo(T), which is itself a refinement of 
balfc(T), for any /c > 0, this implies that Ct(balfc(T)) < Ct(balo(T)) < 3^Ct{T). 
Note that the same construction also implies the factor S'* on the number of 
leaves. 

Next we prove in Lemma 8 that the object cost of balfc(7^ is at most twice 
(for points) and some constant Bd < S‘^ (for simplices) times that of T. The 
proof is omitted for lack of space. 

Lemma 8. Let T he a tree, and consider the object cost of a single object s € S 
both inT and in halk{T). If s is a point, then \d-i{sC\halk{T)) < 2Ad_i(snT). 
If s is a convex object of dimension at most d—1 (e.g. a{d— l)-simplex), then 
\d-i{s n halk{T)) < BdXd-i{snT). 

Compounding all these costs together, we have Co(balfc(T)) < 3‘^Co{T) for 
any 0 < k < d, which ends the proof of the theorem. 

Remark. It could also very well be that rebalancing actually decreases the cost. 
We don’t know that, and we don’t need it since we are mostly interested in trees 
for which c(T) = 0{M). In any case, we can ask if there is a reverse theorem 
(lower bound on c(balfe(T)) in terms of c(T)). 

6 Conclusion 

In this paper we have proved that instead of considering the optimal octree, and 
without increasing the cost too much, we may consider the octree given by the 
lookahead strategy. Still, this may yield an infinite subdivision. In order to have 
an effective algorithm, we need to add a termination criteria such as a depth 
limit of log 2 n. As we have also proven, this increases again the cost at most by 
a constant factor. In practice, we find that greedy with or without lookahead 
yield near-optimal octrees, hence the approximation ratio seems close to one.^ 
All the results stated in this paper should extend easily to recursive grids and 
simplicial trees as well, in two and higher dimensions, with only small differences. 
However, the constants involved in the analysis would be even higher than they 
are here. 

We conclude with a few open problems: first, is it true that by pruning 
at depth k = 6>(logn), we can approach the cost to within 1 -|- £? Since the 
optimal tree might be infinite, there is little sense in asking for an algorithm 

^ Actually, they both yield octrees of same cost which are the lowest cost we observe 
with other heuristics; we find it hard to believe that they would all be c times 
optimal, for some constant c > 1. 
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that constructs the optimal tree. But if the answer to the first question were 
true, it would be nice to have a PTAS with respect to the cost measure. We 
don’t know if the greedy strategy for high enough lookahead would fit the bill. 

Lastly, the cost measure considered here is simple but does not model the 
average traversal cost during ray shooting. For this, the following cost measure 
should be considered [1]: 

c*(T) = ^(7 + |S' n (t|) X (Ad_i((r) + Xd-i{S n a)), (2) 

cr 

X where Ad-i(5 H cr) measures the portion of the objects within a. Our only 
result here is that the greedy strategy does not work. 
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Abstract. In this extended abstract, we present approximation algorithms for the 
following packing problems: the strip packing problem, the two-dimensional bin 
packing problem, the three-dimensional strip packing problem, and the three- 
dimensional bin packing problem. For all these problems, we consider orthogonal 
packings where 90° rotations are allowed. The algorithms we show for these prob- 
lems have asymptotic performance bounds 1.613, 2.64, 2.76 and 4.89, respec- 
tively. We also present an algorithm for the z-oriented three-dimensional packing 
problem with asymptotic performance bound 2.64. To our knowledge the bounds 
presented here are the best known for each problem. 



1 Introduction 

We present approximation algorithms for packing problems allowing orthogonal 
rotations. We consider the following problems: the strip packing problem, the 
two-dimensional bin packing problem, the three-dimensional strip packing problem, 
and the three-dimensional bin packing problem. The packings must be orthogonal and 
90° rotations are allowed. The algorithms we show for these problems have asymp- 
totic performance bounds 1.613, 2.64, 2.76 and 4.89, respectively. We also present an 
algorithm for the z-oriented three-dimensional packing problem with asymptotic per- 
formance bound 2.64. This result improves our previous results in [10,1 1]. 

Approximation algorithms for the oriented version of these packing problems have 
been extensively considered, but there are very few results when rotations are considered. 
For a survey on approximation algorithms for packing problems, see [3,4]. 

In all problems considered in this paper, all items must be packed into recipients and 
they may not overlap. We consider orthogonal packings where orthogonal rotations are 
allowed, i.e. 90° rotations, around any axis. 

There are many applications in which orthogonal rotations are allowed: cutting of 
hardboard, glass, cloth (when there is no oriented pattern), foam, etc. [3]. Interesting 
applications also occur in job scheduling problems [9,11]. 

* This work has been partially supported by MCT/CNPq - Project ProNEx (Proc. 664107/97-4), 
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The strip packing problem, SPP, is the following: given a list of rectangles L = 
(ri, . . . , r„), where = (xi,yi), and a rectangle R = {a, oo), find a packing of the 
rectangles of L into R minimizing the size of the packing in the unlimited direction of R. 

In the two-dimensional bin packing problem, 2BP, we are given a list of rectangles 
L = (ri, . . . , r„), where = (xi,yi), and two-dimensional bins R = (a, b), and we 
wish to pack the rectangles of L into bins R using the smallest number of bins. 

In the three-dimensional strip packing problem, TPP, we are given a list of boxes 
L = {bi, . . . , bn), where bi = {xi, yi, Zi), and a box B = (a, b, oo), pack the boxes of 
L into B, minimizing the size of the packing in the unlimited direction of the recipient. 

The three-dimensional bin packing problem, 3BP, is defined as follows: given a 
list of boxes L = (6i, . . . , bn), where bi = {xi, yi,Zi), and three-dimensional bins 
B = {a, b, c), find a packing of the boxes of L into the smallest number of bins B. 

We also consider a special version of the three-dimensional strip packing problem, 
called z-oriented three-dimensional strip packing problem, TPP^, which is the same as 
in the definition of TPP, except that boxes are oriented in the z-axis. That is, a box can 
be rotated around the z-axis (height direction), but cannot be laid down. 

In section 2, we present some notation and discuss some results when orthogonal 
rotations are allowed. In section 3, we present the approximation algorithm for the 
strip packing problem and in the section 4, the approximation algorithm for the two- 
dimensional bin packing problem. In section 5, we present the results we have obtained 
for the other packing problems. In section 6, we present some concluding remarks. 



2 Preliminaries 

To define the packings, we consider the Euclidean space with the xyz coordinate 
system. An item e to be packed has its dimensions defined as x{e), y{e) and z(e), also 
denoted as the length, width and height of item e, where each of these dimensions is 
the measure in the corresponding axis of the xyz system. For the one- and the two- 
dimensional cases, some of these values are not defined. 

We denote by SPP(a), 2BP(a,&), TPP(a,6), TPP^(a,6) and 3BP(a,6,c) the 
corresponding problems versions with the recipient sizes defined by values a, b and c. 
We denote by S{ri) the area of the rectangle = {xi, yi) and V (bi) the volume of the 
box bi = (xi, yi, Zi). Given a function / : C — >■ M and a subset C C C, we denote by 
f(C) the sum X)eeC' /(e). For all algorithms, we consider that the items are given in 
an initial configuration that can be packed. That is, given a box e we have x(e) < a, 
y(e) < b and z(e) < c. Given an item e = (a, b), we denote by p(e) := (b, a). We also 
consider that each item dimension is not greater than a constant Z. 

The following is a convenient notation to define and restrict the input list of items: 
X[p,q] ■= {e: p- a < x(e) < q- a}, y[p,q] := {e : p ■ b < y(e) < q ■ b}, 

C [pi,qi ; P 2 ,q 2 ] ■■= X[pi,qi] r\y[p 2 ,q 2 ], Cm := C[0,^; 0, ^], 

Pi :=C[0,i;0,i],P2:=C[0,i; ^ l] , P 3 := C [i, 1 ; 0, |] , P 4 := C [i, 1 ; i,l]. 

Given a list L of items to be packed, and an algorithm A, we denote by A(L) the 
size of the packing generated by algorithm A when applied to list L. Such size can be 
the height of the packing or the number of bins used in the packing, depending on which 
version of packing problem we are considering. For a packing V, we denote hy H(V) its 
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height, and by # (P) the number of bins that is used by V. We denote by OPT(L) the size 
of an optimum packing of L. We say that an algorithm A has asymptotic performance 
bound a if there exists a constant f3 such that A{L) < a ■ OPT(L) + (3, for all input list 
L. 

Since the one-dimensional bin-packing problem is a particular case of all problems 
considered in this paper, it follows that each problem considered here is A/^P-hard. 

One idea to solve problems allowing orthogonal rotations is to simply apply the 
algorithms developed for the oriented case, ignoring any possible rotation. It can be 
shown that there is no algorithm, developed in the way we described above, for the 
strip packing problem with asymptotic performance bound less than 2, and for the two- 
dimensional bin packing problem and the three-dimensional bin packing problem there 
is no known algorithm with asymptotic performance bound less than 3 (see [11]). 

Most of the results concerning approximation results do not consider rotations and 
some questions where posed in the early 80’s. The following quote where extracted from 
a paper of F.R. Chung, M.R. Garey and D.S. Johnson [2] on the two-dimensional bin 
packing problem: 

“A second line of attack would be to design and analyze algorithms which could 
make use of the fact that, in some applications, 90° rotations of rectangles might 
be allowable. 

Algorithms which consider the possibility of rotations might well yield improve- 
ments. Can one prove worst case bounds that reflect these improvements ?” 

There are other papers in the literature that raise questions about orthogonal rotations, 
as for example, [3,5]. Although these papers are from the early 80’s, very few has been 
done about orthogonal rotations. In fact, when the scale does not affect the problem, we 
can show that for any of the general packing problems considered, the version allowing 
orthogonal rotations is as hard to approximate as the oriented version. 

Theorem 1. Let PROB*^ be one of the problems defined previously, considering or- 
thogonal rotations around some of the axes x or y or z (maybe in several axes), 
a and (3 constants and A^ an algorithm such that, A'~(L) < a • OPT(L) -f j3 
for any instance L of PROB*^. Then, we can adapt this algorithm to another algo- 
rithm A for a variant of PROB*^, called PROB, where we fix the orientation of the 
items (with respect to some axis), in such a way that the following relation holds: 
A{L) < a ■ OPT(T) -I- j3 for any instance L o/PROB. 

3 Strip Packing Problem 

In this section we consider the strip packing problem. In [5], Coffman, Garey, John- 
son and Tarjan present the algorithms NFDH (Next Fit Decreasing Height) and FFDH 
(First Fit Decreasing Height) for the oriented case and prove that their asymptotic per- 
formance bounds are 2 and 1 .7, respectively. Another algorithm with asymptotic perfor- 
mance bound 2 is the BLDW, Bottom Leftmost Decreasing Width, algorithm presented 
by Baker, Coffman and Rivest [1]. Kenyon and Remila [8], presented an asymptotic 
approximation scheme for the oriented strip packing problem. 
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When orthogonal rotations are allowed, the hound 2 of the algorithms BLDW and 
NFDH are also valid, since the proofs of the bounds are based only in area arguments. 

The algorithm we present in this section is called ASP. It uses the critical set com- 
bination strategy presented in [10,11]. The idea is to combine items that do not lead 
to packings with good space hlling, if considered independently. We call these items 
critical items. We take two sets with critical items, called here critical sets, and generate 
a combined packing of the items in these two sets. Each algorithm that combines items 
of two critical sets has the property that at the end of this combination, one of these sets 
is totally packed in the combined packing. Moreover, the combined packing has a better 
conhguration than the one we can obtain for each critical set. 

Before presenting the algorithm, we describe an algorithm used as subroutine called 
TC (Two Column) which builds a packing with two columns, each one is a stack of 
rectangles packed one on top of the other. Each column is associated with only one 
critical set. The algorithm TC is called with parameters (Li, L 2 , xi, X 2 ), where Li and 
L 2 are two critical sublists and x\ and X 2 are positions where the columns are built 
aligned to the left, from the bottom of the strip. We call height of the column the sum of 
the rectangles heights in the corresponding column. To pack a rectangle, the algorithm 
chooses the hrst column with smallest height. Let h be the height of this column and Li 
the list associated with this column. If not all rectangles of Li have been packed, then the 
next rectangle of Li, say r, is packed in the position {xi, h). Then, the list Li is updated 
(by removing the rectangle r). This process is repeated until one of the lists, L\ or L 2 
is totally packed. We assume the positions x\ and X 2 and the lists L\ and L 2 are such 
that they do not generate infeasible packings. Any algorithm that combines critical sets 
returns a pair {V' , L'), where V' is the packing generated and L' is the set of rectangles 
packed in V' . The following lemma can be proved for the algorithm TC. 

Lemma 1. Let V be a packing of L' G LiU L 2 generated by the algorithm TC when 
applied to lists Li and L 2 for SPP(a). If x{r) > li ■ a for all r G Li, i = 1,2, then 

We denote the value s in inequalities of the form H{V) < 1^^^ + Z as an area 
guarantee of the packing V. The idea used in the algorithm ASP is to generate the final 
packing consisting of two parts: one associated with a partial optimum packing generated 
with rectangles with width greater than | , and the other associated with a packing with 
better area guarantee. 

Note that in an oriented packing, if L\ has only rectangles with width greater than 
I then we can obtain an optimum packing only by placing one rectangle on top of the 
other. When rotations are allowed, we can also have a similar result. In this case, we 
first reorient and remove all rectangles of Li with width greater than | but height at 
most |. In this way, the remaining rectangles are the ones that cannot be reoriented, 
or if they can, they continue with width greater than | . Clearly, the only way to pack 
these remaining rectangles is to pack one on top of the other. After this rotation step, 
we introduce another reorientation step for the new rectangles in L\ in such a way that 
each rectangle stays with the lowest possible height, maintaining a feasible orientation. 
Thus, we can continue having a partial optimum packing only by packing one rectangle 
on top of the other. 
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The algorithm TC is used to pack the critical rectangles of L\ and critical rectangles 
of the remaining part. The packing of the remaining non-packed rectangles is done using 
NFDH strategy, which consists in first sorting the rectangles in non-decreasing order of 
their height and then packing the rectangles in this order, placing rectangles side by 
side generating levels. When a rectangle cannot be packed in a level it is packed in a 
new consecutive parallel level (for more details of the algorithm NFDH, see [5]). The 
following result is valid for NFDH. 

Lemma 2. Let A^i , . . . , Ny be the levels generated by NFDH for a list L, in the order 
they are generated. If w{Ni) is the total sum of the width of the rectangles in Ni, and 

there exists a constant s such that w{Ni) > s ■ a, for I < i < v — 1, then we have 

NFDH(L) < + Z. 

Algorithm ASP(L) 

Input: List of rectangles L for SPP(a) 

1 Rotate all rectangles r G L with x{r) > | and y{r) < |. 

2 Rotate all rectangles r G L with x{r) > | and x{r) < y{r) < a. 

3Letp^l/-\/6 and L'^ ^ {s G L : < x{s) < j}, i=l,...,4, 

G L : (1 — 2p) ■ a < x(s) < |}, L'g-(—{s G L : x(s) < (1 — 2p) • a}, 
{r G L'l : x{r) < {I — p) ■ a}, Lb-^{tG . .U Lg : x{r) <p ■ a}. 

4 (P/iB, L ab) TC(L^, 0, 1 — p); L'\Labj for i = l,...,6; 

5 Pi ^NFDH(Li), i=l,...,6; 

6 Vopt ^ 'Pi\[Pab\ aux 

1 Return Popi II 



Theorem 2. For any input list Lfor the SPP(a), where the rectangles ofL have dimen- 
sions at most Z, we have ASP(L) < 1.613 • OPT(L) + 6Z. 

Proof. Since each rectangle of La has width at least | and each rectangle of Lb has 
width at least (1 — 2p)a, from Lemma 1 we conclude that the following inequality holds. 



H{Vab) < 



S(Lab) 



l/2 + (l-2p) 



+ Z < 



S(Lab) 



Denote by Lypt the set of rectangles packed in Vopt- It is easy to see that Vopt is 
an asymptotic optimum packing of Lopt since the large rectangles (with x{r) > |) 
of Li U . . . U Lg U Lab either cannot be rotated, or if they can, they remain in the 
set defined for Li. Moreover, the large rectangles of Lopt are packed with the smallest 
possible height. Therefore, we have 



H{Vopt) < OPT(L) + Z. 



(2) 



Now, we analyze two cases. In the first (second), we consider that all rectangles of La 
(all rectangles of Lb) are totally packed in Vab- 

Case 1. La ^ Lab - In this case, we have all rectangles of Li := L'^ \ Lab with width 
greater than (1 — p) • a. Using (1), we obtain S{Lopt) > (H{Vopt) — Z) - {1 — p) - a. 
That is. 



H{Vopt)<j^^^^ + Z. 

I — p a 



( 3 ) 
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For each list Li (i = 2, . . . , 6) the area guarantee in each level of Vi, except 
perhaps in the last, is at least | . This area guarantee is obtained by considering the width 
occupation in each level of the packing generated by the algorithm NFDH. Since the 
packing Vaux is the concatenation of packings V2, ■ ■ ■ ,V@, from Lemma 2 we have 

H{Vaux) < + 5 ^. ( 4 ) 

2/3 a 

Defining h\ := H(Vopt) — Z and /12 := H{Vaux) — 52', we have 



OPT(L) > 



S{L) _ S{Lopt) , S{Laux) 



> {I - p) ■ hi + -■ h2 ■ 



(5) 



From (2) and (5) we have OPT(L) > max{ft,i, {1 — p) ■ hi + ^ ■ / 12 }, and therefore, 
we obtain iT(P) < • OPT(L) + 62, where ai = ^ 

This last inequality can be proved by analyzing the two possible values attained by the 
maximum. 

Case 2 . Lb Q Lab - The analysis of this case uses the same arguments used in case 1. 
Therefore, we present only the inequality one could obtain 

H(V) < «2 ■ OPT(L) + 62, 
where «2 = h \ ^ ■ 

Substituting the value of p in the bounds obtained in the two cases above, we obtain 
H{V) < 1.613 • OPT(L) + 62. □ 



4 Two-Dimensional Bin Packing Problem 

In 1982, Chung et al. [2] presented an algorithm with asymptotic performance bound 
2.125, which is the best bound known for the oriented two-dimensional bin packing 
problem. 

In this section, we present an algorithm using orthogonal rotations, called BI^ ^, with 
asymptotic performance bound that can be made not larger than 2.64. This algorithm 
follows the same technique used in the algorithm ASP. It also defines critical sets and 
combination of them, although in more steps and phases. 

Before presenting the algorithm, we describe some algorithms used as subroutines 
by the main algorithm; the first algorithm is the algorithm NFDH"^, used for the packing 
of simple sublists. The algorithm uses two subroutines to combine critical sets: the 
algorithms COMBINE-AB^ and COMBINE-CD. The algorithms FLg and the algorithm 
FFD, both for one- dimensional bin packing problem are used to generate packings of 
special sets that can be interpreted as one-dimensional packing problems. 

The algorithm FLe was developed by Fernandez de la Vega and Lucker [6]. The 
algorithm presented by these authors is a linear time algorithm with additive constant 
0(l/e). For our purposes, we consider a polynomial time version of it with additive 
constant 1, which can be found in [12]: 

Theorems. [6,12] For any e > 0, there exists a polynomial time algorithniFL,, for the 
bin-packing problem such thatFL^(L) < (1 -f e) • OPT(L) -f 1. 




Packing Problems with Orthogonal Rotations 



365 



The algorithm FFD first sorts the items of L in non-increasing order of their lengths, 
and then packs items in the order given hy L. To pack an item, the algorithm FFD tries 
to pack the new item into one of the previous bins, considering the order they were 
generated. If it is not possible to pack in the previous bins, it packs in a new bin. Johnson 
[7] proved that the asymptotic performance bound of the algorithm FFD is 11/9. 

The algorithm NFDIT^ (Next Fit Decreasing Height) first sorts the input list L in 
non-decreasing order of height, then packs the rectangles side by side (first horizontally 
and then vertically) generating levels. When a rectangle cannot be packed in the current 
level, a new level is generated parallel to the last one, which becomes the current level. 
When a level cannot be generated in the current bin, it is generated at the bottom of a 
new bin, which becomes the current bin. 

The variant of this algorithm that generates levels in the y-direction is denoted by 
NFDFl*^. Subdividing a list L C Cm into more sublists, and applying an appropriate 
variant of the algorithm NFDH“, we can obtain an algorithm, denoted by BI^, for 
which the following lemma holds. 

Lemma 3. For any list of rectangles L = (ri, . . . , r„), where x{ri) < ^ and y{ri) < 

w have BI„(L) < (^)' S{L) + 6. 

The following lemma gives an upper bound for the number of bins used in a packing 
that has a minimum area occupation in each bin. 



Lemma 4. If V is a packing of a list of rectangles L such that all bins (a, b), except 
perhaps k of them, have an area occupation of at least f, then =ff{V) < j + k. 

The critical sets used by the algorithm COMBINE- AB are defined by the numbers 

(k) (k) 

r\ and s\ , presented in [10,11], defined as follows. 



Definition 1. Let r/ 



(fc) Ak) 



that r[ 



(k) 



< I; 



„('=) 1 



= rf(l-rn=r-r(l- 



(k) 7 (k) 

ands\ 



fk) 






(k), 



(<^) _ 



„(fc) _ 1 



3 

and s 



k-\-i 



fc + 1 
= 1 - 



(- 



4i-|-10 



(fe) _ 
2 “ 

J 



1 

4> 



fk) _ 



k+15 

fori= 1,...,14. 



i_ . 

17’ 



fk)\ 
2 ) 
fk) 



, be real numbers such 



r, ') = ... = r 

„(fc) 



(k) 






= 1 — r ■ ’ for i = 1 , . . . , A;; 



For simplicity we omit the superscripts of the notation when k is clear from 

the context. Using a continuity argument, we can prove that the numbers ri,r 2 , ■ ■ ■ ,fk 
are such that ri > r 2 > • • • > > | and ri — >■ | as A: — >■ oo. Now, we can define the 

following critical sets. 

A=C[r,+i,n; B,=C[^,s,;ri+i,n], S = 



Lemma 5. Given a list L of rectangles for the 2BP(a, 6), the algorithm 
COMBINE-AB packs all rectangles of type A or all rectangles of type B. Moreover, 
ifVAB B the packing generated and Lab the rectangles in Vab, then =ff(VAB) < 
36 -L2fc-f 41. 



17 ab 



After applying the algorithm COMBINE-AB, suppose that all rectangles of type B 
have been packed in Vab- Consider the lists Li, . . . , L 23 , defined in step 4.3 of the 
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algorithm The packing of lists L\ and Tig, generated by the algorithm NFDH, 

has an area guarantee close to but for sublists L 2 , ■ ■ ■ , T 17 , Tig, . . • , T 23 the NFDH 
strategy generates packings with volume guarantee better than | (namely at least 17/36). 
Therefore, we dehne the critical set Ljj := T/j UL'/, (see step 4.3) as the set of rectangles 
of Li U Tis which leads to packings with area guarantee close to 4/9. The area guarantee 
we can obtain for rectangles in T iT (H U P 4 ) \ Lab is \ ■ Therefore, we define the critical 
set Lc as the critical rectangles in T (T P 4 (see step 4.3). Let us denote the algorithm 
that combines the sets Lc and T 44 by COMBINE-CD. The packing that is generated 
has bins with one rectangle of Lq and one rectangle of L/j , or one rectangle of Lq and 
two rectangles of T'/,. If Vcd is a packing generated by COMBINE-CD, and Lcd is 
the set of rectangles packed in Vcd, then or Lc C Lcd or Lb ^ Lcd - Moreover, the 
packing Vcd has an area guarantee close to 17 /36. More precisely. 

Lemma 6. If Vcd is a packing generated by the algorithm COMBINE-CD and Lcd 
is the set of rectangles in Vcd, then =f{VcD) < riAArxli + 2- 

The packings Vab and Vcd have area guarantee close to Depending on which 
set is totally packed in Vc d , we can improve either the area guarantee of | , of T iT ( U 
P 4 ) \ (Tab UTcr>) packing, or the area guarantee of | , of the TiT (H U P3) \ (Tab UTcb) 
packing. Now, it is possible to obtain an algorithm for 2BP with asymptotic performance 
bound close to 2.64. For simplicity, we denote by Update(T) the procedure that removes 
from T all rectangles previously packed. 

Algorithm BI^, ^(T) 

Input: List of rectangles T for the 2BP(a, b) 

1 Rotate all rectangles r G L C\ P 4 , where /ci(r) G Pi U ^2 U 

2 / ^ 0.4574271. 

3 {Vab, Lab) ^ COMBINE- ABfe(T). Update(L). 

4 If all rectangles of type B were packed then 

4.1 Rotate each rectangle of T (T P 2 that fits in Pi U P 3 . 

4.2 Rotate each rectangle of T (T (P 2 U P 4 ), so that if 6 G T (T (P 2 U P 4 ) then 
x{b) < y{b) or p{b) ^ Ci. 



4.3 


Subdivide the 


list T into sublists Ti , . . . , 


T23 as follows. 




Li^Lf]C 1 ; 


i+2’ i+lj 


,for 


i = 1, . . . , 16 , 




Ti 7 t— Tpc [|, 1 ; 


b’ 13] 


Li 8 


t-Tf|C [|, 1 


’ 3 ’ 2J ’ 


Ti 9 


t-Tf|C [|, 1 


’ 4 ’ 3J 


,T2o^TpC[|,i 




L21 


•^-TflC [|, 5 


’ 3 ’ 2J ’ 


T22 


^Tnc[0,i 


> 3 ’ 2J 


, T23 T pc [0, 1 ; 




Lc 


^Tnc[|,l; 




L'd 


t— Ti Pi C [0, f ; 


0,1], 


L]j ■<— Li8 Pi ^ ^ 


; 0,1], 


Ld 


L'd U Tjt). 















4.4 Generate packing Vcd as follows. 

{Vcd, Lcd) ^ COMBINE-CD(T); Ti ^ Li\Lcd', Tig ^ Li^\Lcd- 

4.5 Generate packings V\, , V 23 as follows. 

V^ ^ NFDH^(Ti) for i = 1, . . . , 21; V^ ^ NFDH"=(Ti) for i = 22; 
V 23 ^ Bl 3 (T 23 ). Update(T). Now, we have T C P 2 U Pi. 

4.6 Consider each rectangle of T as a one-dimensional item of length x{r) and each 
two-dimensional bln as a one-dimensional bin of length a. 

Let Vph^ be the packing obtained by FLg(T) and let Pffd be the packing 
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FFD(L n X[0, i])||FFD(F n X[l, i])||FFD(i n X[^, 1]). 

Let PuNi be the smallest packing in {P-pi ,, , ’Pffd}- 

4.7 Vaux^VAB\\VcD\\VA...\\V2^-, 

4.8 V ^ VumW'Paux- 

5 If all rectangles of type A were packed, then generate a packing P of L as in step 4 

(in a symmetric way). 

6 Return V. 

Theorem 4. For any instance L o/2BP, we have < ak,e • OPT(L) + 0{k), 

where ak,e — >■ 2.63 ... as k ^ co and e —>■ 0. 

Proof. We present the proof for the case in which all rectangles of type B are packed in 
step 3. The proof for the other case (all rectangles of type A are packed) is analogous. 
This proof is divided in two cases, according to step 4.4 {Lc ^ Lcd or Ld C Lcd)- 
Analyzing the conhguration of each bin in the packing z G {1, . . . , 23} \ {1, 18}, 

we can show that Vi have an area occupation of at least ^ ■ a ■ b in each bin, except 
perhaps in a constant number of them. Therefore, applying lemmas 4, 3, 5 and 6 we have 



+ for zG{1,...,23}\{1,18}. 



#{rAB) < 



17 ab 
36 S{Lab) 



+ (2A: + 41), #{Vcd)< 



S{Lcd) 



17 ab 

Case 1. Lc Q Lcd - For packings V\ and Vis, we have 



(i + ^) 






9 



7 ’ ^ / — 4 1. 

r\ ab H ab 

By Theorem 3, and analyzing the area guarantee of Pffd, we have 

#(T’uni) < #(T’flJ < (1 + e) • OPT(Luni) + 1- 

*S'(Luni) 



#(T’uni) < #(T’ffd) < 



1 

(1-i) 



ab 



+ 3. 



(6) 

+ 2. (7) 

( 8 ) 

(9) 

( 10 ) 



Now, for the packing Vaux = Vab\\Vi || . . . ||P 23 . using the inequalities (6)-(8) and 
the fact that ri = min {||)ri,|; + ^,|},we obtain 

m'Paux) < ^^%^ + (2A: + 68), 

where Laux denotes the set of rectangles in Vaux- Let rzi := #(Puni) — 3 and ri 2 := 
f^iVaux) — (2fc + 68). From inequality (9) and the fact that OPT > we have 
OPT(L) > max|Y^m, +ri •n 2 |. Since #(T’) = #{Vaux) + #{V\mi), 
now we have #(P) = (rz 2 + {2k + 68) + rzi + 3) = rzi + rz 2 + {2k + 71) . Therefore, 



BIfc,,(L) < < , • OPT(L) + {2k + 71) , 



where 



«fc.e = (^1 +n 2 )/max| +ri • n 2 | < ;L _ + (1 + e). 

Case 2 . Ld C Lcd- In this case, the proof is analogous and we can obtain that 



BIfc,,(L) < < , • OPT(L) + {2k + 71), where a" ^ = 



(rti+rt2) 



7 “ + (1 + e)- 

From cases 1 and 2, we conclude that for k 



max} j^ni,i-ni+t'n2} 



< 



oo and e — 0 the theorem follows. □ 
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5 Three-Dimensional Packing Problems 

Due to space limitations, we only present the results for the algorithms ATP, ATP^ and 
A3D we obtained by applying the same techniques to the problems TPP, TPP^ and 
3BP, respectively. 

Theorems. For any instance L o/TPP, we have ATP^ ^(L) < ^ • OPT(L) + 

O {k + ■ Z, where at,e — >■ 2.76 . . . as k ^ oo and e — >■ 0. 

Theorem 6. For any instance L ofTPP^, we have ATP^ ^(L) < ak,e ■ OPT(T) + 
O (k + ■ Z, where ak,e — >■ 2.63 . . . as k ^ oo and e — >■ 0. 

This last result improves our previous result in [10,1 1]. 

Theorem 7. For any list of boxes L for SBF, we have A3D{L) < ak,e‘OPT{L)+ (3k,e, 
where limfc_>oo cxk,e < 4.88 . . ., and fik^e ^ constant for constant values ofk and e. 



6 Concluding Remarks 

We presented several approximation algorithms for packing problems where orthogonal 
rotations are allowed. These problems have been less investigated in the literature. To our 
knowledge, the bounds presented are the best ones known for each problem. We would 
like to thank David S. Johnson, for his comments about the status of these problems. 
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Abstract. We consider the problem of protein folding in the HP model 
on the 3D square lattice. This problem is combinatorially equivalent to 
folding a string of O’s and Ts so that the string forms a self-avoiding walk 
on the lattice and the number of adjacent pairs of I’s is maximized. The 
previously best-known approximation algorithm for this problem has a 
guarantee of | = .375 [HI95]. In this paper, we first present a new |- 
approximation algorithm for the 3D folding problem that improves on 
the absolute approximation guarantee of the previous algorithm. We then 
show a connection between the 3D folding problem and a basic combina- 
torial problem on binary strings, which may be of independent interest. 
Given a binary string in {a, fe}*, we want to find a long subsequence of 
the string in which every sequence of consecutive a’s is followed by at 
least as many consecutive b’s. We show a non-trivial lower-bound on the 
existence of such subsequences. Using this result, we obtain an algorithm 
with a slightly improved approximation ratio of at least .37501 for the 
3D folding problem. All of our algorithms run in linear time. 



1 Introduction 

We consider the problem of protein folding in the HP model on the three- 
dimensional (3D) square lattice. This optimization problem is combinatorially 
equivalent to folding a string of O’s and I’s, i.e. placing adjacent elements of the 
string on adjacent lattice points, so that the string forms a self-avoiding walk on 
the lattice and the number of adjacent pairs of I’s is maximized. Figure 1 shows 
an example of a 3D folding of a binary string. 

Background. The widely-studied HP model was introduced by Dill [Dil85, 
Dil90]. A protein is a chain of amino acid residues. In the HP model, each amino 
acid residue is classified as an H (hydrophobic or non-polar) or a P (hydrophilic 
or polar). An optimal configuration for a string of amino acids in this model 
is one that has the lowest energy, which is achieved when the number of H-H 
contacts (i.e. pairs of H’s that are adjacent in the folding but not in the string) 
is maximized. The protein folding problem in the hydrophobic-hydrophilic (HP) 
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model on the 3D square lattice is combinatorially equivalent to the problem we 
just described: we are given a string of P’s and H’s (instead of O’s and I’s) and 
we wish to maximize the number of adjacent pairs of H’s (instead of I’s). An 
informative discussion on the HP model and its applicability to protein folding 
is given by Hart and Istrail [HI95]. 

Related Work. Berger and Leighton proved that this problem is NP- 
hard [BL98]. On the positive side, Hart and Istrail gave a simple algorithm 
with an approximation guarantee of |OPT — 0{'/OPT) [HI95]. Folding in the 
HP model has also been studied for the 2D square lattice. This variant is also 
NP-hard [CGP"''98]. Hart and Istrail gave a ^-approximation algorithm for this 
problem [HI95], which was recently improved to a | -approximation algorithm 
[New02]. 

Our Contribution. Improving on the approximation guarantee of | for the 3D 
folding problem has been an open problem for almost a decade. In this paper, we 
first present a new 3D folding algorithm (Section 2.1). Our algorithm produces 
a folding with |OPT — 6*(1) contacts, improving the absolute approximation 
guarantee. We then show that if the input string is of a certain special form, 
we can modify our algorithm to yield \OPT — 0{5{S)) contacts, where 5{S) 
is the number of transitions in the input string S from sequences of I’s in odd 
positions in the string to sequences of I’s in even positions. This is described in 
Section 2.2. 

In Section 3, we reduce the general 3D folding problem to the special case 
above, yielding a folding algorithm producing .439 • OPT — 0{5{S)) contacts. 
This reduction is based on a simple combinatorial problem for strings, which 
may be of independent interest. 

We call a binary string from {a, 6}* block-monotone if every maximal se- 
quence of consecutive a’s is immediately followed by a block of at least as many 
6’s. Suppose we are given a binary string with the following property: every 
suffix of the string (i.e. every sequence of consecutive elements that ends with 
the last element of the string) contains at least as many 6’s as a’s. What is the 
longest block-monotone subsequence of the string? It is easy to see that we can 
find a block-monotone subsequence with length at least half the length of the 
string by removing all the a’s. In Section 3.1, we show that there always is a 
block-monotone subsequence containing at least a (2 — -\/2) ~ .5857 fraction of 
the string’s elements. 

Finally, we combine our folding algorithm with a simple, case-based algorithm 
that achieves .375- OPT-\- f2{6{S)) contacts, which is described in the full version 
of this paper. We thereby remove the dependence on 6{S) in the approximation 
guarantee and obtain an algorithm with a slightly improved approximation guar- 
antee of .37501 for the 3D folding problem. Due to space restrictions, all proofs 
are omitted and can be found in the full version of this paper. 
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2 A New 3D Folding Algorithm 

Let S G {0, 1}" represent the string we want to fold. We refer to each 0 or 1 
as an element. We let Si represent the element of S, i.e. S = siS 2 ■ ■ ■ Sn- We 
refer to a 1 in an odd position (i.e. = 1 with odd index i) as an odd-1 and 

a 1 in an even position (i.e. = 1 with even index i) as an even-1. An odd or 

even label is determined by an element’s position in the input string and does 
not change at any stage of the algorithm. We will use 0[S'] and £1[S'] to denote 
the number of odd-l’s and even-l’s, respectively, in a string S. For example, for 
S = 10111100101101, we have 0[S] = 5 and £i[S'] = 4. 

Note that because the square lattice is bipartite, the odd/even label deter- 
mines the set of lattice points on which an element can be placed. For example, 
suppose we divide the lattice points into two bipartite sets, one red and one 
blue. If the first element of the string is placed on a red lattice point, then all 
the elements in odd positions in the string will be placed on red lattice points 
and all the elements in even positions in the string will be placed on blue lattice 
points. 

A contact between two elements placed on the square lattice can therefore 
only occur between an odd-1 and an even-1. Each lattice point is adjacent to six 
neighboring lattice points. In any folding, if an odd-1 is placed on a particular 
lattice point, two neighboring lattice points will be occupied by preceding and 
succeeding (even) elements of the string unless the element is one of the two 
endpoints of the string. Therefore, there are four remaining adjacent lattice 
points with which contacts can be formed. Thus, an upper bound on the size of 
an optimal solution is OPT < 4min{C)[S'], -|- 2. 

2.1 The Diagonal Folding Algorithm 

We now present an algorithm that produces a folding with at least |OPT — 
0(1) contacts in the worst case, thereby improving the absolute approximation 
guarantee of the algorithm of Hart and Istrail [HI95]. Our algorithm is based 
on diagonal folds. The algorithm guarantees that contacts form on and between 
two adjacent 2D planes. Each point in the 3D lattice has an (x, y, 2 :)-coordinate, 
where x, y, and z are integers. We will fold the string so that all contacts occur on 
or between the planes z = 0 and z = 1. The Diagonal Folding Algorithm 
is described on the next page and illustrated in Figure 1. 

Lemma 1. The Diagonal Folding Algorithm produces a folding with at 
least |OPT— 0(1) contacts. 

2.2 Relating Folding to String Properties 

As the number of I’s placed on the diagonal in the Diagonal Folding Al- 
gorithm increases, the length (i.e. ^ minlO)^], ^[A]}) of the resulting folding 
increases in a direction parallel to the line x = y. The height of the folding may 
also increase depending on the maximum distance between consecutive odd-l’s 
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Diagonal Folding Algorithm 

Input: a binary string S. 

Output: a folding of the string S. 

1. Let k = min{C)[S'],£:[S']}. 

2. Divide S into two strings such that So contains at least half the odd-l’s and 
Ss contains at least half the even-l’s. We can do this by finding a point on the 
string snch that half of the odd-l’s are on one side of this point and half the 
odd-l’s are on the other side. One of these sides contains at least half of the 
even-l’s. We call this side Se and the remaining side So- Then we replace all 
the even-l’s in So with O’s and replace all the odd-l’s in Ss with O’s. 

3. Place the first odd-1 in So on lattice point (1, 1, 1) and the next odd-1 in So 
on lattice point (2,2, 1) and so on. For the first ^ of the odd-l’s in So, place 
the odd-1 on lattice point {i,i, 1). Then place the (fc/4-|- 1) odd-1 on lattice 
point (fc/4 — 1, fc/4 -I- 1, 1). For the first | — 1 of the even-l’s in Ss, place the 

even-1 on lattice point 1, 1). Use the dimensions z > 1 to place the 

strings of O’s between consecntive odd-l’s in So and the strings of O’s between 
consecutive even-l’s in Ss- 

4. Place the (fc/4 -I- 2) odd-1 in So on lattice point (fc/4 — 2,fc/4-|-l,0). Then place 
the (fc/4 -I- i) odd-1 in So on lattice point (fc/4 — i + l,fc/4 — i + 2,0). Place 
the (fc/4) even-1 in Ss on lattice point {kjA— l,fc/4— 1,0). Place the (fc/4 -I- i) 
even-1 in Ss on lattice point (fc/4 — i — 1 , fc/4 — i — 1 , 0). Use the dimensions 
2 < 0 to place the strings of O’s between consecutive I’s in So or Ss- 



in So or consecutive even-l’s in Sg- However, regardless of the input string, the 
resulting folding has the same constant width in the direction parallel to the line 
X = —y- In other words, although the algorithm produces a three-dimensional 
folding, with increasing k and n, the folding may increase in length and height 
but not in width. We will explain how we can use this unused space to improve 
the algorithm for a special class of strings. 

By consecutive odd-1 ’s we mean odd-l’s that are not separated by even-l’s 
and similarly for consecutive even-l’s. For example, in the string 1010001100011, 
there is a sequence of 3 consecutive odd-l’s followed by two consecutive even-l’s 
followed by an odd-1. 

Definition 2. A string So is called odd-monotone if every maximal sequence 
of consecutive even-l’s is immediately preceded by at least as many consecutive 
odd-1 ’s- 

An even-monotone string is defined analogously. For example, the string 
10101100011 is odd-monotone and the string 0100010101101101011 is even- 
monotone. We define a switch as follows: 

Definition 3. A switch is an odd-1 followed by an even-1 (separated only by 
O’s)- We denote the number of switches in S by S{S)- 

For example, for the string S = 100100010101101101011, 6{S) = 2 since there 
are two transitions (underlined) from a maximal sequence of consecutive odd-l’s 
to a sequence of even-l’s. We use these definitions in the following theorem. 
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Fig. 1. This figure illustrates Steps 3 and 4 of the Diagonal Folding Algorithm. 
In the folding resulting from this algorithm, all contacts are formed on or between the 
2D planes 2 = 0 (lower) and 2=1 (upper). Black dots represent I’s and white dots 
represent O’s. 



Theorem 4. Let S = SqSs and let So be an odd-monotone string and Ss be 
an even-monotone string such that 0[5'o] = £[S£] and iP[S'ci] = 0[S£\. Then 
there is a linear time algorithm that folds these two strings achieving jOPT — 
16(5(5') — 0(1) contacts. 

The main idea behind the proof of Theorem 4 is that we partition the el- 
ements in So and Sg into main-diagonal elements and off-diagonal elements. 
We then use the Diagonal Folding Algorithm to fold the main-diagonal 
elements along the direction x = y and the off-diagonal elements into branches 
along the direction x = —y (see Figure 2). All I’s will receive 3 contacts except 
for a constant number of I’s for each off-diagonal branch, which correspond to 
switches in the strings So and Sg , and a constant number at the ends of the main 
diagonal. This yields the claimed number of \OPT — 0{5{S)) — 0(1) contacts. 

To precisely define main-diagonal and off-diagonal elements, we use addi- 
tional notation. We use 0^ and (for some integer A: > 0) to refer to the strings 
consisting of k O’s and k I’s, respectively. By writing 5 = for some integer k, 
we mean that 5 is of the form 5 = o2*o+iio2u+i 102 * 2+1 iQZb+i . . . 
for integers ij > 0, and all the I’s in 5 are even-l’s. Likewise, we write 
S = to refer to a string of the same form where all I’s are odd-l’s, i.e. 
5 = io2*i+1io2*2+i iq 2 i 3 +i _ gQ express any string Sg 

as Sg = for k = S(Sg) and integers Oi and bi. If 
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Ss is even-monotone, then Ui > bi for all i. We can express any string So as 
So = . . ,0‘^‘E‘^‘ for i = S{So) and integers Ci and di. If So is 

even-monotone, then Ci > di for all i. 

Definition 5. For an odd-monotone string So = E'^^ E‘^^ . . . 0'^‘E‘^‘ , the 

first set of Ci — di odd-1 ’sin each block, i.e. the elements , 

are the main-diagonal elements and the remaining elements E’^^ E‘^^ . . . 

QdtEdt 

are the off-diagonal elements in So- 

For even-monotone strings, we define main-diagonal and off-diagonal ele- 
ments analogously. In our modified algorithm, it will be useful to have S^ and 
So in a special form. Two sets of off-diagonal elements in So, 0‘^'E‘^* and 
Qdi+i^di+i^ are separated by Cj+i — di+i odd-l’s that are main-diagonal ele- 
ments. We want them to be separated by a number of main-diagonal elements 
that is a multiple of 8. This will guarantee that the off-diagonals used to fold 
the off-diagonal elements are regularly spaced so that none of the off-diagonal 
folds interfere with each other. We will use the following simple lemma. 

Lemma 6. For any odd-monotone string So it is possible to change at most 
%5{So) 1 ’s to O’s so that the resulting string S' is of the form S' = 0°'^ E^^ 0°'^ E^^ 

. . . 0°‘*‘ , where at — bi is a positive multiple of 8 for 1 < i < k. 

We note that there is an analogous version of Lemma 6 for even-monotone 
strings. With this preparation, we can now state our folding algorithm. 



Off-Diagonal Folding Algorithm 

Input: A binary string S — SoSs, such that So is odd-monotone, Ss is even- 
monotone, 0[So] = and £[So] = 0[Ss]. 

Output: A folding of the string S. 

1. Change at most 8S{S) I’s to O’s in So and Ss to yield the form specified in 
Lemma 6. 

2. Run Diagonal Folding Algorithm on main- diagonal elements along the 
direction x = y and change from plane 2 ; = 0 to 2 = 1 when the length of the 
main diagonal equals 4 • [0[S'c)]/8j -I- 2. 

3. Run Diagonal Folding Algorithm on the off-diagonal elements along the 
direction x — —y. The off-diagonal elements attached to the main-diagonal 
elements on the plane 2 = 1 are folded along the diagonals x = —y -|- 8k. The 
off-diagonal elements attached to the main-diagonal elements on the plane 2 = 0 
are folded along the diagonals x = —y -I- 8fc -I- 4. (See Figure 2.) 
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Fig. 2. Folding the off-diagonal elements in Step 3 of the Off-Diagonal Fold- 
ing Algorithm. The main- diagonal elements are represented by the dashed lines on 
the main diagonal. The off-diagonal elements are represents by the solid lines on the 
off-diagonals. This figure shows how the repetitions of the Diagonal Folding Al- 
gorithm on the off-diagonals interleave and thus so not interfere with each other. 
The closeup gives an example of how the off-diagonal folds are connected to the main 
diagonal. 



3 Combinatorial Problems on Strings 

In this section, we present a combinatorial theorem about binary strings that 
allows us to use the algorithm from Section 2.2 for the general 3D folding prob- 
lem. The binary strings that we consider in this section are from the set {a, b}*. 
Given a string to fold in {0, 1}*, we map it to a corresponding string in {a, 6}* by 
representing each odd-1 by an a and each even-1 by a b. For example, the string 
10100101 would be mapped to the string aabb. We will use theorems about the 
strings in {a, 5}* to prove theorems about subsequences of the strings in {0, 1}* 
that we want to fold. 

The combinatorial problem that we want to solve is the following: given a 
string S G {0, 1}* such that £i[S'] = 0[5'], we want to divide the string into two 
substrings such that one contains an even-monotone subsequence and the other 
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contains an odd-monotone subsequence and the number of I’s contained in these 
monotone subsequences is as large as possible, since the I’s in these subsequences 
are the I’s that will have contacts in the Off-Diagonal Folding Algorithm. 

Given a string S G {0, 1}*, we will treat it as a loop L{S) by attaching its 
endpoints. In other words, we are only going to consider foldings of the string 
that place the first and last element of S on adjacent lattice points. (If S has 
odd length, we can add a 0 to the end of the string and fold this string instead 
of S; a folding of this augmented string will yield a valid folding of the original 
string.) 

Lemma 7. Let L{S) € {0, 1}* be a loop, and k = min{0[S'],iP[S']}. Then it is 
possible to change some I’s of L{S) to O’s such that there is a partition L{S) = 
SoSs with So and Ss odd- and even-monotone, respectively, 0[So] = SiSs], 
^[So] = 0[Ss], and 0[So] + 0[Ss] > (2 — -\/2)fc. Furthermore, this partition can 
be constructed in linear time. 

To prove this lemma, we first apply Lemma 2.2 from [New02] to cut the 
string into two substrings and then apply Theorem 13 to each substring. Lemma 
7 implies that every 3D folding instance can be converted into the case required 
by Theorem 4 by converting not too many I’s into O’s. We obtain the following 
corollary of Lemma 7 and Theorem 4. 

Corollary 8. There is a linear time algorithm for the 3D folding problem that 
generates at least .439 • OPT — 16(5(5') — 0(1) contacts. 



3.1 Block-Monotone Subsequences 

Let 5 be a binary string, 5 G {a, 6}”. We will use the following definitions. 

Definition 9. Let na{S) and nb{S) denote the number of a’s and b’s, respec- 
tively, in a string S. 

Definition 10. A block is a maximal substring of consecutive a’s or b ’s in a 
binary string. 

Definition 11. A binary string is block-monotone if every block of a’s is im- 
mediately followed by a block of at least as many b’s. 

For example, the string bbbbaaabb has two blocks of b’s (of length four and two) 
and one block of a’s (of length three) . An example of a block-monotone string 
is baaabbbaaabbbb. The string aabbaaabb is not block-monotone. 

Given a binary string 5, our goal is to find a long block-monotone subse- 
quence. It is easy to see that 5 contains a block-monotone subsequence of length 
at least nb{S) since the subsequence of b’s is trivially block- monotone. It is also 
easy to see that there are strings for which we cannot do better than this. For 
example, consider the string Pa’. In this string, there is no block monotone sub- 
sequence that contains any of the a’s. Thus, we will put a stronger condition on 
the binary strings in which we want to find long block-monotone subsequences. 
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Notation, a := 1 ^ « 0.2929 

v2 

Definition 12. A binary string S = si . . . s„ is suffix-monotone if for every 
suffix Sk = Sfc+i • • ■ s„, 0 < k < n, we have ni,{Sk) > a ■ {n — k). 

For example if every suffix of S has at least as many 6’s as a’s, the string is 
suffix-monotone. We will give an algorithm to prove the following theorem. 

Theorem 13. Suppose S is a suffix-monotone string of length n. Then there 
is a block-monotone subsequence of S with length at least n — no(5')(2-\/2 — 2). 
Furthermore, such a subsequence can be found in linear time. 

If ria( S') < \n and S is suffix- monotone, then Theorem 13 states that we 
can find a block-monotone subsequence of length at least (2 — -\/2) > .5857 the 
length of S. This is accomplished by the following algorithm. 



Block-Monotone Algorithm 

Input: a suffix-monotone string S = Si ... Sn 
Output: a block-monotone subsequence of S 
Let Si = Si ... Si, Si = Si+i . . . Sn for i : 1 < i < n 

1. If Si =b: 

(i) Find the largest index k such that Sk is a block of b’s and output Sk 

2. If Si = a: 

(i) Find the smallest index k such that: 

nb{Sk) > ak 

(ii) Let S^ = sr+i ... s* for € : 1 < i < k 

(iii) Find i such that: 

ria{St) < nh{S[) 

na{Se) -\-m{Si) is maximized 

(iv) Remove all the b’s from Se and output Se 

(v) Remove all the a’s from S'^ and output S'l 

3. Repeat algorithm on string Sk 



4 Conclusion 

We conclude by stating an approximation guarantee independent of <5(5'). In the 
full version of this paper, we give a case-based algorithm whose approximation 
guarantee is |OPT -h 0{S{S)). This algorithm is based on the following idea: 
Suppose So and Ss contain half the odd-l’s and half the even-l’s, respectively. 
We use the Diagonal Folding Algorithm, but for each switch in So, we use 
different local foldings to obtain an additional (constant) number of contacts, 
e.g. we use an even-1 in the switch to obtain another contact with an odd-1 
placed on the main diagonal. The performance of this algorithm is summarized 
in Lemma 14, which in combination with Corollary 8 yields Lemma 15. 
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Lemma 14. We can modify the Diagonal Folding Algorithm to create a 
folding with ^OPT + — 0(1) contacts for any binary string S. 

Lemma 15. There is a linear time algorithm for the 3D folding problem that 
creates a folding with .37501 • OPT — 0(1) contacts for any binary string S. 

We have described an algorithm for protein folding in the HP model on the 3D 
square lattice that slightly improves on the previously best-known algorithm to 
yield an approximation guarantee of .37501. The contribution of this paper is 
not so much the actual gain in the approximation ratio, but the demonstration 
that the previously best-known algorithm is not optimal, even though there have 
been no improvements for almost a decade. 

In closing, we discuss the problem of finding block-monotone subsequences 
of binary strings. One way to improve the approximation ratio of our algorithm 
is to improve the guarantee given by Theorem 13. We note that we only apply 
Theorem 13 to binary strings in which every suffix contains at least as many 6’s 
as a’s — a stronger condition than our definition of block- monotone. Theorem 13 
implies that such strings contain block-monotone subsequences of at least .5857 
their length. We conjecture that the real lower bound is actually | their length. 
Currently, the best upper bound we are aware of is the string: 

aaaaabaaaabaaabaababbbaaabaaabababaababbbbbbbbbbbbbb 

whose longest block-monotone subsequence is which is « 71.15% of 

the length of the original string. 
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Abstract. The Partial Digest problem asks for the coordinates of m 
points on a line such that the pairwise distances of the points form a given 
multiset of (^) distances. Partial Digest is a well-studied problem 
with important applications in physical mapping of DNA molecules. Its 
computational complexity status is open. Input data for Partial Digest 
from real-life experiments are always prone to error, which suggests to 
study variations of Partial Digest that take this fact into account. 
In this paper, we study the computational complexity of the variation 
of Partial Digest in which each distance is known only up to some 
error, due to experimental inaccuracies. The error can be specified either 
by some additive offset or by a multiplicative factor. We show that both 
types of error make the Partial Digest problem strongly NP-complete, 
by giving reductions from 3- Partition. In the case of relative errors, we 
show that the problem is hard to solve even for constant relative error. 



1 Introduction 

The Partial Digest problem is perhaps the classic combinatorial problem from 
computational biology with applications in DNA sequencing. Despite consider- 
able research efforts in the past twenty years, its computational complexity is 
still an open problem. In the Partial Digest problem we are given a multiset 
D of distances and are asked to find coordinates of points on a line, i.e., a point 
set P, such that D is exactly the multiset^ of all pairwise distances of these 
points. In this case, we say that D is the distance multiset of point set P. A 
formal definition of the problem is as follows. 

Definition 1 (Partial Digest). Given an integer m and a multiset of k = 
(™) positive integers D = {di,... ,dk], is there a set of m integers P = 
{pi, . . . ,p„} such that {\pi - Pj\\l<i < j <m} = D? 

For example, ii D = {2, 5, 7, 7, 9, 9, 14, 14, 16, 23}, then P = {0, 7, 9, 14, 23} is 
one feasible solution (cf. Figure 1). 

* * * Work partially done while M. Cieliebak was visiting LANL. LA-UR-03:6621. 

^ We will denote multisets like sets, since the fact of being a multiset is not crucial for 
our purposes. 
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Fig. 1. Example for Partial Digest 



Previous Work 

Intriguingly, the computational complexity of this seemingly straight-forward 
combinatorial puzzle is a long-standing open problem, and it appears in its pure 
combinatorial formulation already in the 1930’s in the area of X-ray crystallog- 
raphy (acc. to [16]). The problem is also known as “turnpike problem”, where 
we are given the pairwise distances of cities along a highway, and we want to 
find their ordering along the road [4]. The Partial Digest problem can be 
solved in pseudo-polynomial time [10,13], and there exists a backtracking algo- 
rithm (for exact or erroneous data) that has expected running time polynomial 
in the number of distances [16,17], but exponential worst case running time [20]. 
The Partial Digest problem can be formalized by cut grammars, which have 
one additional symbol S, the cut, that is neither a non-terminal nor a terminal 
symbol [14], and the problem is closely related to the theory of homometric sets^ 
[16]. Finally, if the points in a solution do not have to be on a line, but only in 
d-dimensional space, then the problem is NP-hard for some d > 2 [16]. However, 
for the original Partial Digest problem, neither a polynomial-time algorithm 
nor a proof of NP-hardness is known [2,4,11,12,15]. 

Biological Background 

Partial Digest has several applications; the classical and most prominent is 
in the study of the structure of DNA molecules. More precisely, given a large 
DNA molecule (sequence of nucleotides A, C, G, and T), restriction enzymes 
can be used to generate a physical map of the molecule. A restriction enzyme 
cuts a DNA molecule at specific patterns, the restriction sites. For instance, the 
enzyme Eco RI cuts occurrences of the pattern GAATTC into G and AATTC. Under 
appropriate experimental conditions, all fragments between each two restriction 
sites are created. This process is called partial digestion. The lengths of the 
fragments (i.e., their number of nucleotides) are then measured by using gel 
electrophoresis, a standard technique in molecular biology. This leaves us with 
the multiset of distances between all restriction sites, and the objective is to 



^ Two (non-congruent) sets of points are homometric if they generate the same mul- 
tiset of pairwise distances. 
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reconstruct the original ordering of the fragments in the DNA molecule, which 
is the Partial Digest problem. 

Erroneous Data 

In real-life, partial digestion experiments cannot be conducted under ideal con- 
ditions as outlined above, and thus errors occur in the data. In fact, there is 
no such thing as error-free data, and typically four types of errors occur [5,6,8, 
17]: additional fragments, for instance through contamination of the probe with 
unrelated biological material; missing fragments, due to partial cleavage errors, 
or because of small fragments that remain undetected by gel electrophoresis; 
incorrect fragment lengths, due to the fact that fragment lengths cannot be de- 
termined exactly using gel electrophoresis; and, finally, wrong multiplicities, due 
to the intrinsic difficulty to determine the proper multiplicity of a distance by 
gel electrophoresis^. 

Algorithms for Partial Digest with inaccurate data have been studied in- 
tensively in the literature [5,8,17], and different error models have been designed, 
e.g. for measurement errors that are logarithmic in the size of the fragment 
length [18,19] or for intervals of absolute errors [1,17]. Optimization variations 
of Partial Digest where fragments are either omitted or added in the data, 
and the number of errors has to be minimized, are known to be N P-hard or hard 
to approximate, respectively [3]. 

In this work we will focus on the third type of error, where the lengths of 
fragments can be erroneous {measurement errors). In partial digestion exper- 
iments all measurements of fragment lengths are prone to inaccuracies: Using 
gel electrophoresis, measurement errors within a range of up to 5 percent of the 
fragment length can occur [5,6,17]. 

Many experimental variations of partial digest experiments have been stud- 
ied, see [9] for a survey; and for more detailed discussions on the problem, see [12] 
and [15]. 

Definitions and Results 

In this paper, we study the computational complexity of Partial Digest in the 
presence of measurement errors, where we allow both additive or multiplicative 
errors. 

We start with additive errors. The Partial Digest problem is known to 
be strongly N P-hard if additive error bounds that can be even zero can be 
assigned to each distance individually [9,16]. However, this does not model reality 
appropriately, since in real-life data we cannot assume that even one single 
fragment length can be measured exactly. Therefore, we study the computational 
complexity of the variation of Partial Digest where all measurements are 
prone to some non-zero error. Moreover, we refrain from individual error bounds, 
and study the variation where all measurements are prone to the same additive 
non-zero error 5. More precisely, we say that value v matches a distance d 
up to additive error <5 if jv — dj < 6; moreover, a multiset I? is a distance 
multiset of a point set P up to additive error 5, if there is a bijective function 

^ The multiplicity of a fragment is determined from the intensity of the corresponding 
band in the gel. 
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f : D ^ A{P) such that each distance d G D matches value f{d) up to error 
6] here, A{P) = {\pj — | I < i < j < n} denotes the multiset of pairwise 

distances in P. The PD-AbsError problem is defined as follows. 

Definition 2 (PD~AbsError). Given an integer m, a multiset D of k = (™) 
positive integers, and an integer error hound 5 > Q, is there a set P of m points 
on a line such that D is the distance multiset of P up to additive error 6 ? 

We show in Section 2 that PD-AbsError is strongly NP-complete, by giving 
a reduction from 3-Partition. 

We then turn to the case of multiplicative errors. We say that distance d 
matches a value x up to multiplicative error e > 0 if d(l — e) < a: < d(l + e). 
Observe that this definition is not symmetric, i.e., if d matches x up to error £, 
then this does not in general imply that x matches d (in contrast to the definition 
of additive errors, which is symmetric). A multiset D is a distance multiset of 
point set P up to multiplicative error e if there is a bijective function f : D ^ 
A{P) such that each distance d £ D matches value f{d) up to multiplicative 
error e. The PD~RelError problem is defined as follows. 

Definition 3 (PD-RelError). Given an integer m, a multiset D of k = (™) 
positive integers, and a rational error e > Q, is there a set P of m points on a 
line such that D is the distance multiset of P up to multiplicative error e? 

We show in Section 3 that PD-RelError is strongly NP-complete, even for 
constant error, by using a similar reduction as for PD-AbsError. 

2 Strong NP-Completeness of PD AbsError 

In this section, we show that PD-AbsError is strongly NP-complete, by giving 
a reduction from 3-Partition, which is the following problem: Given 3n positive 
integers qi, ■ ■ ■ , qzn and an integer h such that X)i=i di = n-h and j < qi < ^ for 
i G {1, . . . , 3n}, are there n disjoint triples of qfs such that each triple adds up 
to hi The 3-Partition problem is NP-complete in the strong sense [7]. Observe 
that j < qi < ^ already implies that each subset of the qfs that adds up to h 
must have exactly three elements. 

The idea of the reduction is as follows. Given an instance qi,... , <? 3 n and 
h of 3-Partition, we define a multiset of distances D and an additive error 
5=1 that form an instance of PD-AbsError. Our construction is based on 
the following observation: If there is a solution for the 3-Partition instance, 
then we can arrange the qfs such that triples of adjacent g^’s sum up to h. If we 
sum up, say, 25 adjacent qi, then we sum over at least 7 complete triples that 
each have sum h, plus some few (up to four) additional qfs at the beginning 
and the end. In the special and trivial case that all qfs have exactly value 
we can easily determine the exact sum of the 25 values. However, in a given 
instance of 3-Partition typically not all qfs will have value |. However, they 
have “approximately” value |, since they satisfy | < < | by definition. In 

the proof of the following theorem, we will use additive error 6 to “close the gap” 
between | and the true values of the qfs. 
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Theorem 4. PD-AbsError is strongly NP-complete. 



Proof. The problem PD~AbsError is in NP: Given a candidate point set P, 
we sort all distances between any two points in P, and all distances in D] then 
P is a solution if error S is sufficient to match the i-th distance from P to the 
t-th distance from D. 

To prove strong NP-hardness, we give a reduction from 3-Partition. Given 
an instance of 3-Partition, i.e., integers qi, ■ ■ ■ ,q3n and integer h, we define 
a distance multiset D and an additive error 6 that are an instance of PD- 
AbsError. There will be a solution for this instance if and only if there is a 
solution for the 3-Partition instance. Parallel to the definition of D, we show 
already the “if” direction of the previous statement: To this end, we assume that 
the 3-Partition can be solved, i.e., there are n triples Ti, . . . ,T„ of qfs that 
each sum up to h. We show how to construct a point set P that is a solution 
for the PD~AbsError instance, i.e., P matches D up to additive error 6. The 
opposite direction (“only if”) is shown in a second step. We want to stress at this 
point that although the definition of D and the construction of P are presented 
simultaneously, the definition of D itself does not rely on the fact that there 
exists a solution for the 3-Partition instance. 

We assume that ^ is integer. Otherwise, we can achieve this by simply 
multiplying all values qi and h by 12. Moreover, we assume w.l.o.g. that the values 
gi, . . . ,q3n are ordered such that the three qfs that belong to the same triple 
Tj in a solution are adjacent, i.e., Ti = (gi, 52, 93), ?2 = (<?4, 957 <Z6), and so on. 
Finally, we assume that the elements in each Ti are sorted in ascending order, i.e., 
9i < <72 < <737 94 < 95 < 96 7 and so on. This ordering allows us to derive a set of 
inequalities for the 9's. Let (93^+17 qsk+2i qsk+a) be a triple that sums up to h, for 
0 < k < n—1. Then 93fe+i < |, since qsk+i is the smallest of the three elements in 
the triple, and not all of them can be greater than |. Similarly, | < 93^+3. With 
93/c+i + 93fc+2 = h- qsk+3, we have qsk+i + 93fc+2 < /i - | = ^. In combination 
with the restriction j < qi < ^ (from the definition of 3-Partition) and 

:= 7P, this yields the following inequalities: 



3iL < 93fc+i < 4iL 

3H < 93fc+2 < 6iL 

4iL < 93fc+3 < 6iL 

QH < 93fc+i + 93fc+2 < 8iL 

S>H < 93fc+2 + qsk+3 < 12iJ 



12iL — 93A7+I + 93fc+2 + 93fe+3 



( 1 ) 



We will use these inequalities later to derive upper and lower bounds for the 
additive error that we need to apply to our distances in order to guarantee the 
existence of a solution for the PD-AbsError instance. 

Before we define our distances, we need to introduce the level of a distance: 
For a point set P, we say that a distance d between two points has level i if it 
spans £ — 1 further points, and we say that distance d is an atom if it has level 
1. E.g. in Figure 1, distance 5 is an atom, and distance 16 has level 3. 
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d[7,2, 1] 




Zl Z2 Z3 Cl Z 4 Z 5 Ze C 2 Zy ^ Zg Zg C 3 



sum in third digit: h sum in third digit: qj 



sum in third digit: ^2 + ^3 + ^ 



Fig. 2. Atoms and distances in multiset D. 



In the following, we will use a vector representation for large numbers that 
will allow to add up the numbers digit by digit. The numbers are expressed 
in the number system of some base Z. We denote by (ai, . . . ,a„) the number 
'Tfe say that is the i-th digit of this number. In our proofs, we 
will choose base Z large enough such that the additions that we will perform do 
not lead to carry-overs from one digit to the next. Hence, we can add numbers 
digit by digit. The same holds for scalar multiplications. For example, having 
base Z = 29 and numbers a = (3, 5, 1) and (D = (2, 1, 0), then a + P = (5,6, 1) 
and 3 • a = (9, 15, 3). 

We now define our instance of PD-AbsError and show at the same time 
how to construct a solution for this instance. Let c = rP ■ hp. Moreover, define 
error S := 3H. The distances are expressed as numbers with base Z = lOnc, and 
each distance consists of three digits. The first digit will denote the level of a 
distance (the meaning of the other two digits will become clear soon). 

First we define 4n— 1 distances that will turn out to be atoms in our solution: 
Zi = (1, 0, Qi) — (5, for 1 < f < 3n, and Cj = (1, c, 0) — <5, for 1 < t < n — 1. Observe 
that operation “—5” only affects the last digit (and in fact, we could have defined 
Zi by (1,0, — 5) instead), since we choose base Z sufficiently large. 

Using these distances, we can already define a “solution” P for distance mul- 
tiset D (although we are not yet finished defining D; in fact, we will construct 
D in the following such that it matches point set P up to additive error S): Let 
Zi = Zi + S for 1 < i < 3n, and -I- 15 for 1 < i < n — 1. Observe that each Zi 

has exactly value Qi in its third digit. We call these values z-pseudoatoms or c- 
pseudoatoms, respectively, and use them to define a point set P = {pi, . . . ,P 4 n} 
by specifying the pairwise distances between the points: Starting in 0, the points 
have distances zi,Z 2 ,Z 3 , ci,Z 4 ,Z 5 ,zq, £ 2 ,... , c„_i, Z 3 „_ 2 , zsn-i, Le., we al- 
ternate blocks of three z-pseudoatoms and one c-pseudoatom, starting and end- 
ing with a block of three z-pseudoatoms (see Figure 2). 

We now show level by level how the distances in D are defined, and that 
additive error 6 (which is 3H) is sufficient to make all distances from D match 
some distance between points in P. 

By construction of P, the distances of level 1 are the pseudoatoms, and they 
match the corresponding Zj’s and Cj’s up to additive error S. 

To denote the distances of higher levels we use notation d [£, j, k] , for appro- 
priate parameters £,j and k. These names already indicate the values of the 




Measurement Errors Make the Partial Digest Problem NP-Hard 



385 



three digits of a distance: Distance d[£,j,k] will have value I in the first digit, 
which will be the level of the distance in our point set P. The second digit of 
the distance has value j ■ c, which denotes that this distance will be used to span 
j c~pseudoatoms (and £ — j z-pseudoatoms) in our point set P. For instance, in 
Figure 2 distance d[7, 2, 1] spans the two pseudoatoms ci and 62 (and five Zi’s). 
Finally, the third digit of distance d [£, j, k] has value k ■ h plus some “small off- 
set”, which will be a multiple of H. Here, k specifies how many complete blocks 
of three adjacent z-pseudoatoms the distance spans in P (recall that such a 
block corresponds to three qds that sum up to exactly h). In the following, we 
show how to choose these offsets in the third digit such that our point set P 
matches distance multiset D up to additive error 5. 

First consider distances of level 2 in P, i.e., two points Pi,Pi +2 G P with 
one point Pi+i in between. There are four possibilities for the two pseudoatoms 
between these two points, for some 0 < k < n — 1: Case 1: z^k+i and zsk+ 2 ', 
Case 2: Zsk+2 and z^k+s', Case 3: zsk+3 and Ck] and Case 4: Ck and z^k+i- 

For the first case, the two pseudoatoms sum up to 2 in the first and to 0 in 
the second digit. For the third digit of the sum, recall that zsk+i has value <73^+1 
in its third digit, and 23^+2 has value <73^+2 in its third digit. Hence, inequalities 
(1) yield that the third digit of z^k+i + Z 3 k +2 is bounded below by 6iF and 
bounded above by 8H. We define a distance d[2,0,0] := (2,0,9iJ). Obviously, 
we can span the two pseudoatoms by this distance if we apply at most error S 
(recall that S = 3H). Observe that we could have chosen other values for the 
third digit of d [2, 0, 0], namely any value between 5H and 9H (which still allows 
to match the bounds using additive error <5). Here, we chose value 9H, since we 
will use that same distance to cover the two pseudoatoms in Case 2 as well (see 
below) . 

Case 1 occurs exactly n times in our point set P, once for each block of 
three z-pseudoatoms. Hence, we let distance d [2, 0, 0] be n times in our distance 
multiset D. 

Case 2 is similar to Case 1: The third digit of zsk +2 + zsk+s is bounded below 
by 8H and bounded above by 12H, using again inequalities (1). Like before, 
this case occurs n times, and we can use n additional distances d [2, 0, 0] in D to 
span such two pseudoatoms up to error 6. Thus, in total we have 2n distances 
d [2, 0, 0] in D that arise from the first two cases. 

For the remaining two cases of two pseudoatoms, the last digit of the two 
pseudoatoms is at least 4H and at most 6H in Case 3, and at least 3H and at 
most AH in Case 4. Moreover, in both cases the first digit of the sum is 2 and 
the second digit is c, and both cases occur exactly n — I times. Hence, we can 
define distance d [2, 1, 0] := (2, c, 4id) and enclose it 2{n— 1) times in D, in order 
to cover these pairs of pseudoatoms, again up to additive error 6. 

Before we specify the distances of higher level, we introduce a graphical 
representation of pseudoatoms: Each z-pseudoatom is represented by a •, and 
each c-pseudoatom by a |. This allows us to depict sequences of pseudoatoms 
without referring to their exact names. E.g. pseudoatoms Z3C1Z4Z5Z6C2 yield 
• I ••• I, and the four cases of two adjacent pseudoatoms above can be represented 
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level i pseudoatoms multiplicity lower bound upper bound distance name distance value 



2 


• • 


n 




6H 


8H 


d [2, 0, 0] 


(2,0,944) 




• • 


n 




8H 


12H 


d [ 2 , 0 , oj 






• 1 


n — 


1 


AH 


6H 


d [ 2 , 1 , oj 


(2,c,444) 




!• 


n — 


1 


3H 


AH 


d [ 2 , 1 , oj 




3 


• • • 


n 




12H 


12H 


d js, 0, ij 


(3,0, 1244) + S 






n — 


1 


6H 


8H 


d [3, l,oj 


(3, c,944) 






n — 


1 


7H 


lOH 


d js, 1 , oj 






••1 


n — 


1 


8H 


12H 


d js, 1 , oj 




4 




n — 


1 


IIH 


16// 


d j4, 1, oj 


<4, c, 1344) 






n — 


1 


lOH 


lAH 


d [4, 1, oj 






...| 


n — 


1 


12H 


12H 


d j4, 1, ij 


<4, c, 1244) 




|... 


n — 


1 


12H 


12H 


d [4, 1, ij 




5 


• • 1 • • 


n — 


1 


lAH 


20H 


d[5, l,oj 


(5, c, 1744) 




• • • 1 • 


n — 


1 


15H 


16// 


djs, 1, ij 


(5, c, 1644) 




• 1 • • • 


n — 


1 


16// 


18// 


d [5, 1, ij 






|...| 


n — 


2 


12H 


12H 


d js, 2, ij 


<5, 2c, 1244) 


6 




n — 


1 


18H 


20H 


d [6, 1, ij 


<6, c, 2144) 






n — 


1 


20H 


2AH 


d [6, 1, ij 






• 

• 

• 

• 


n — 


2 


16// 


18H 


d je, 2, ij 


(6,2c, 1644) 




1 ... 1 . 


n — 


2 


15// 


16// 


d je, 2, ij 




7 




n — 


1 


24// 


2AH 


d[7, 1,2] 


(7, c, 2444) 




..i...i 


n — 


2 


20// 


2AH 


d [7,2, ij 


<7, 2c, 2144) 




. 1 . . . 1 . 


n — 


2 


19// 


22H 


d [7,2, ij 






i...i.. 


n — 


2 


18// 


20H 


d [7,2, ij 





Fig. 3. Distances up to level 7. 



by *1 and |*. Figure 3 shows the distances, bounds, and multiplicities for 

level 2 to 7. 

Observe that d [2, 0, 0] and d [6, 1, 1] are in a sense “equivalent”, since they are 
used for cases that differ only in one complete block of three z-pseudoatoms and 
one c-pseudoatom. Hence, we could have written d [6, 1, 1] = d [2, 0, 0] + (4, c, h) 
instead. Moreover, d[6,2,l] = d[2, 1,0] + (4, c, /i) and d [7, 2, 1] = d[3,l,0] + 
(4, c, h) . Similarly, distances of level greater than 7 can be decomposed into a 
distance of low level (4 to 7) and an appropriate number of blocks of three z- 
pseudoatoms and one c-pseudoatom. We set (3 := (4, c, h) and define in Figure 4 
the distances of level 8 to 4n — 5. In the table, the number of blocks k varies 
from 1 to n — 3. Finally, in Figure 5 the distances that have level 4n — 4 to 4n — 1 
are shown. Observe that as before they are derived from distances of level 4 to 
7, for k = n — 2. However, not all combinations are necessary for these distances. 

Our distance multiset D consists of all atoms Zi and Cj, and all distances 
specified in Figures 3, 4 and 5, with the corresponding multiplicities. There are 
4n — 1 levels, and for each level £ there are 4n — ^ distances in D. In total, this 
yields ~ ^) = (^ 2 ) distances. The cardinality of D is polynomially 

bounded in n, and each distance in D is polynomial in h. Hence, multiset D can 
be constructed in polynomial time from a given instance of 3-Partition. 

In parallel to the definition of O, we have shown already that a solution for 
the 3-Partition instance yields a solution for the PD-AbsError instance. In 
the following, we show the opposite direction, i.e., we show that a solution for 
the PD-AbsError instance yields a solution for the 3-Partition instance. 
Let R = {ri, . . . , r 4 „} be any set of 4n points on a line that is a solution for the 
PD-AbsError instance, i.e., multiset D is the multiset of pairwise distances 
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level i pseudoatoms 



multiplicity distance name 



distance value 



+ 4 



5 + 4fc 



6 + 4fc 



7 4k 



i...i 

■ ■I 

• I- 

• i» 



• I- 



>1 



■ ■■I 

• i» 

• I- 

i... 
i...i 



n — k — 1 

n — k — 1 

n — k — 1 

n — k — 1 

n — k — 1 

n — k — 1 

n — k — 1 

n — k — 2 

n — k — 1 

n — k — 1 

n — k — 2 

n — k — 2 

n — k — 1 

n — k — 2 

n — k — 2 

n — k — 2 



d [4 + 4fc, 1 + fc,0 + fc] 
d [4 + 4fc, 1 + fc,0 + fc] 
d [4 + 4fc, 1 + fc, 1 + fc] 
d [4 + 4fc, 1 + fc, 1 + fc] 
d [5 + 4fc, 1 + fc,0 + fc] 
d [5 + 4fc, 1 + fc, 1 + fc] 
d [5 + 4fc, 1 + fc, 1 + fc] 
d [5 + 4fc, 2 + fc, 1 + fc] 
d [6 + 4fc, 1 + fc, 1 + fc] 
d [6 + 4fc, 1 + fc, 1 + fc] 
d [6 + 4fc, 2 + fc, 1 + fc] 
d [6 + 4fc, 2 + fc, 1 + fc] 
d [7 + 4fc, 1 + fc, 2 + fc] 
d [7 + 4fc, 2 + fc, 1 + fc] 
d [7 + 4fc, 2 + fc, 1 + fc] 
d [7 + 4fc, 2 + fc, 1 + fc] 



d [4, 1, 0] + fc • (3 

d [4, 1, 1] + fc • /3 

d [5, 1, 0] + fc • /3 
d [5, 1, 1] + fc • /3 

d [5, 2, 1] + fc • /3 
d [6, 1, 1] + • /3 

d [6, 2, 1] + fc • /3 

d [7, 1, 2] + fc • /3 
d [7, 2, 1] + fc • /3 



Fig. 4. Distances with level 8 to 4n — 5 (with j3 = (4, c, h)). Value k varies between 1 
and n — 3. 



of R, up to additive error 5 for each distance. We assume w.l.o.g. that the points 
are ordered from left to right, i.e., ri < r 2 < . . . < r 4 „. We will show that R is 
basically identical to P, the point set that we constructed above. 

Obviously, additive error 5 can affect only the last digit of each distance, 
since base Z is sufficiently large. Thus, exactly those distances with value 1 in 
the first digit are atoms, since all other distances have value greater than 1 
in the first digit, and since there must be exactly 4n — 1 atoms. This implies 
immediately that the first digit of each distance denotes the level of the distance 
in any solution. 

We now show that error +5 has to be applied to each single atom to make it 
fit to the distances between adjacent points in R. To see this, first observe that 
the atoms sum up to Ci = (4n — 1, (n — l)c, nh) — (4n — 1)5. 

On the other hand, d [4n — 1, n — 1, n] = (4n — 1, (n — l)c, nh) + 5 is the largest 
distance in D. Each atom is the distance between two adjacent points in R, up 
to additive error 5, while d [An — 1, n — 1, n] is the distance between the first and 
the last point in R, again up to additive error 5. Hence, the atoms must sum up 
to the length of the largest distance. This is only possible if we apply error +5 
to each atom, yielding sum (4n — 1, (n — \)c,nh), and if we apply error —5 to 
the largest distance, yielding {An — 1, (n — l)c, nh) as well. Knowing this, we can 
again define pseudoatoms Zi = Zi + S and Ci = Ci + S, which represent exactly the 
distances of adjacent points in R (without error). Observe that if we represented 
the distances between adjacent points in R in our number representation, then 
pseudoatom Zi would have exactly value Qi in its last digit, for all 1 < i < 3n. 

We now show that the ordering of the pseudoatoms arising from R is such that 
there are n blocks of three pseudoatoms Zj, and each two blocks are separated 
by one pseudoatom c^. Between any two adjacent c-pseudoatoms there must be 
exactly three z-pseudoatoms: Since there are no distances of level 4 with value 
2c in the second digit, no combination 1 1 or 1*1 or | is possible, and there are at 
least three z-pseudoatoms in between two c-pseudoatoms; moreover, since there 
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level I lower bound upper bound distance name distance value 



4n — 4 


(n - 2)h + IIH 


(n - 




(n - 2)h + lOH 


(n - 




(n — l)/i 


(n - 




(n — l)h 


(n - 


4n - 3 


(n — l)h + 3H 


(n - 




(n- l)h-\-4H 


(n - 




{n - 2)h + 14H 


(n - 


4n - 2 


(n — l)h + 6H 


(n - 




(n — l)h + 8H 


(n - 


4n — 1 


nh 


nh 



2)h + 16H d [4n — 4, n — 1, n 

2)h + 14H d [4n — 4, n — 1, n 

l)h d [4n — 4, n — 1, n 

l)h d[4n — 4,n— l,n 

l)h 4- 4H d[4n—3,n—l,n 

1) /i + 6H d [4n — 3, n — 1, n 

2) h + 20H d [4n — 3, n — 1, n 

l)h + 8H d [4n — 2, n — 1, n 

l)h + 12H d [4n — 2,n — l,n 

d [4n — 1, n — 1, n 



-2] d[4, 1,0] + (n - 2) - /3 
- 2 ] 

- 1] d[4, 1, 1] + (n - 2) • /3 

- 1 ] 

- 1] d[5, 1, 1] + (n - 2) • /3 

- 1 ] 

-2] d[5, 1,0] + (n - 2) • /3 

- 1] d[6, 1, 1] + (n - 2) • /3 

- 1 ] 

(4n — 1, (n — l)c, nh) + S 



Fig. 5. Distances with level 4n — 4 to 4n — 1. Each case occurs once. 



are n — 2 distances of level 5 with value 2c in the second digit, there must be at 
least n — 1 c-pseudoatoms such that there are always at most 3 z-pseudoatoms 
in between. Hence, the points in R are such that blocks of three z-pseudoatoms 
alternate with one c-pseudoatom, starting and ending with a block of three 
z-pseudoatoms. 

Finally, we show that the third digits of each three adjacent z-pseudoatoms 
sum up to h: Consider those distances of level 3 that have a zero in the second 
digit. There are n such distances, and their third digits sum up to nh + nd. Each 
of these distances must span exactly one of the n blocks of three z-pseudoatoms. 
The total sum of the last digit of all z-pseudoatoms is exactly X)i=i 9* = 
Since the distances of level 3 that span these blocks do not overlap, they have 
to sum up to the same total. Hence, the error for each such distance of level 
3 must be —S. This implies that each three g^’s that correspond to one block 
sum up to exactly h (since we have applied error +(5 to each atom to define 
the z-pseudoatoms). Thus, these triples yield a solution for the 3-Partition 
instance. □ 



3 Strong NP-Completeness of PD RelError 

In this section, we show that PD-RelError is strongly NP-complete by using 
a reduction from 3-Partition similar to the one used to prove strong NP- 
completeness of PD-AbsError (see Theorem 4). 

Theorem 5. PD-RelError is strongly NP-complete, even if the error is a 
constant. 

Proof (sketch). The problem is in NP analogously to the proof of Theorem 4. 
The proof of NP-hardness is also along the lines of the proof of Theorem 4. In 
fact, the proof has a similar structure overall, but the details are quite different. 
Given an instance of 3-Partition, we define a multiset E of distances which 
are expressed as numbers with a base Z, with Z = 10ft.nc and c = n"^h^. 

We replace the definition of the atoms as follows: Zj = (1,0, for 

1 < t < 3n, and Cj = (1, c, 0) • for 1 < i < n — 1. All Zi’s and cfs are part 
of the distance set E. Note that for a fixed level £, the corresponding distances 
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d[£,-,-] from the proof of Theorem 4 are defined for at most two consecutive 
values of the second digit, say j and j + 1. Here, we define distances e [t,j] and 
e [£,j + 1] for all levels 2 < i < 4n— 1 and corresponding j or j+1, respectively, as 
follows: e[l,j] = (A j, j))- and e[^, j + 1] = {£,j + + !))• 

using values H„() and Bi{) as specified below. 

The first digit £ still indicates the level of the distance (i.e., how many atoms 
it will span in a solution) and the second digit j or j + 1 indicates the number 
of c-atoms it will span. Value By_{£,j) is the maximum upper bound from the 
corresponding column in Figure 3, Figure 4, or Figure 5, taken over all distances 
d[£,j,‘] (for Figure 4, these bounds result from Figure 3 by adding appropriate 
multiples of h); similarly, value Bi{£,j + 1) is the minimum lower bound from 
the corresponding column in the figures, taken over all distances d[£,j + 1,-]. 
The multiplicity of distance e [£,j] is the sum of the multiplicities for all distance 
values d [£, j, ■] taken from the same figures, likewise for distance e [£, j + 1] . Thus, 
for example e [5, 1] = (5, 1, 20H) ■ with multiplicity 3(n — 1), while e [6, 2] = 
(6, 2, 15iF) • with multiplicity 2(n — 2). 

For d [-]-distances with levels divisible by four (i.e., distances d[4t",j, •] with 
integer £' < n), we only have one possible value j for the second digit. Thus, 
we define the corresponding e [-(-distances by e [4:£',j] = (4£', j, B„(4£', j)) • y^. 
Finally, we define two special distances: e [3, 0] = (3, 0, h) ■ y^, with multiplicity 
n, and e [4n — 1, n — 1] = (4n — 1, (n — l)c, nh) ■ y^ with multiplicity 1. 

All the distances, including the atoms, are put into distance multiset E. 
We set error £ = y^. This completes our description of how to construct a 
PD-RelError instance from a given 3-Partition instance. The proof that a 
solution for the 3-Partition instance yields a solution for the PD-RelError 
instance, and vice versa, as well as the strategy to transform these distances into 
integer distances, can be found in the full version of this paper. □ 

4 Conclusion 

We have shown that Partial Digest is NP-complete if all measurements are 
prone to the same additive or multiplicative error. This answers the question 
whether Partial Digest on real-life data can be solved in polynomial time. 
However, it also gives rise to new questions: While we have shown NP-hardness 
for even constant relative error, our proof for absolute error uses error j, which 
is not constant. Is Partial Digest still NP-complete if we restrict the additive 
error to some (small) constant? What if we allow only one-sided errors (i.e., if 
the lengths of the distances are always underestimated)? Moreover, the main 
open problem is still the computational complexity of Partial Digest itself. 
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Abstract. We study the problem of placing symbols of an alphabet 
onto the minimum number of keys on a small keyboard so that any word 
of a given dictionary can be recognized univoquely only by looking at 
the corresponding sequence of pressed keys. This problem is motivated 
by the design of small keyboards for mobile devices. We show that the 
problem is hard in general, and NP-complete even if we only wish to 
decide whether two keys are sufficient. We also consider two variants of 
the problem. In the first one, symbols on a same key must be contiguous 
in an ordered alphabet. The second variant is a fixed-parameter version 
of the previous one that minimizes a well-chosen measure of ambiguity 
in the recognition of the words for a given number of keys. Hardness and 
approximability results are given. 



1 Introduction 

Keyboards are by far the most commonly used interfaces for entering textual 
or numerical data on many communication devices. When this device is small, 
a complete keyboard is not always available: the situation typically occurs for 
mobile phones. The solution used in that case is the overloading of keys: each 
key is associated to more than one symbol of the alphabet. The current standard 
layout for mobile phone is defined by a 1994 ISO specification (cf. Fig. 1 and 
[1]). Numerous methods allow the user to specify which symbol is needed among 
the one corresponding to the pressed key. The multi-tap method is a widely 
proposed one: the desired symbol is selected by pressing more than once the 
same key. Other methods use an algorithm that tries to predict the input at a 
first-order level according to the sequence of pressed keys and using a dictionary 
of words. A common implementation of such an algorithm that uses maximum 
probability estimation is the T9 algorithm [4]. A survey of text entry and dis- 
ambiguation procedures for mobile phones can be found in a recent paper from 
MacKenzie and Soukoreff [9] . Recently, many authors considered the problem of 
estimating the achievable word rate using various methods (see e.g. [3]). While 
many researches related to text entry on mobile devices are conducted in the 
computer-human interface community, it seems that not many of them treat the 
problem of redefining the actual keyboard layout used. In this paper we consider 
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Fig. 1. Usual mobile keypad as recommended by the ISO standard [1] 

the problem of defining keyboard layouts with key overloading using an opti- 
mal partition of an alphabet S, in the sense that the user can type any word 
of a dictionary D, and that word is always recognized without ambiguity, or a 
certain measure of ambiguity is minimized. This is, to our knowledge, the first 
theoretical analysis of this problem. 

A similar issue has nevertheless been investigated by Lesher et al. in [8]. 
They study the problem of arranging characters on a small keyboard with key 
overloading so that the keystroke efficiency is maximized. A heuristic local opti- 
mization algorithm is proposed, based on iterative permutation of a fixed num- 
ber of characters. The objective function, however, is computed based on the 
assumption of a character-level disambiguation procedure, and without any ref- 
erence to a dictionary. Only very superficial considerations on the complexity 
and approximability of the problem are given. 

In section 2 we give a formal definition of the problem and prove that it 
is NP-hard in general, not approximable within (unless NP=coRP), 

and remains complex even if we restrict it to two keys. In section 3 we consider 
a variant in which letters on the same key must be contiguous in an ordered 
alphabet. We prove that this variant is NP-hard as well, but admits a (1-1-2 In \D\) 
factor approximation algorithm, which is the best possible within a constant 
factor. Ambiguous keyboards, in which a well-chosen measure of ambiguity is 
minimized, are considered in section 4. The ambiguity measure is related to the 
average number of keys that have to be pressed to resolve an ambiguity. It is 
useful in practice and can allow for a nonuniform probability distribution over D. 
We show a constant factor approximation for this version of the problem. Finally, 
in section 5, we describe a linear-time algorithm for measuring the ambiguity 
and exhibit optimal ambiguous keyboards for an english dictionary. The optimal 
layout we found for eight keys is interestingly quite different from the standard 
one and requires on average less than half the number of keystrokes to resolve 
an ambiguity. 

2 General Formulation 



We first formalize the problem of designing a keyboard with key overloading 
that allows unambiguous recognition of any word in a given dictionary. 
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Definition 1 (Keyboard) An instance of Keyboard is composed of an al- 
phabet S and a dictionary D C S* . A solution of this instance is a parti- 
tion of S such that for any pair x,y G D with x = {xi,X 2 , ■ ■ ■ ,x\^\) and 
y = ■ • ■ :J/|y|) either |a;| ^ \y\ or there exists an index i such that Xi and 

yt are in different subsets of the partition. The objective function to minimize is 
the size of the partition. 

Using a coloring terminology, this problem can be seen as a minimal coloring 
of the symbols of an alphabet such that any word of a given dictionary can be 
recognized univoquely only by looking at the corresponding sequence of colors. 

Example 1 Let S = {a, b, c, d} and D = {abed, dabb, bbcc, addb}. The partition 
of S in the two subsets {a, b, c} and {d} is an optimal solution of this instance 
o/ K eyboard. If we replace each occurence of a symbol in S by ’1’ if it belongs 
to the first subset, and by ’2’ if it belongs to the second, we obtain the following 
set: {1112, 2111, 1111, 1221}, with four distinct words. 

The following definition is useful in the NP-hardness proof for Keyboard. 

Definition 2 (Graph-Coloring) An instance of Graph-Coloring is com- 
posed of a graph (V,E). A solution is a partition of the set V of vertices such 
that any two adjacent vertices are in different subsets. The objective function to 
minimize is the size of the partition. 

Theorem 1 Keyboard is NP-hard. 

Proof. By reduction of Graph-Coloring, as follows. Let E be defined as V. 
Select an edge pq, and to each edge of the graph associate a unique word made of 
the two symbols p and q, of size I = |"log 2 |E |] . For each edge ab G E, let Wab be 
this word. D is composed of words of equal lengths I -I- 1 of the form WabCi, Wabb 
for each edge ab. The word pair corresponding to the edge pq is {p''^^ , p'' q} , hence 
Wpq = pK From this, p and q must be in different subsets, hence the words We 
and We' for distinct edges e and e' are always distinguishable. On the other hand 
when a and b are adjacent, WabCi and Wabb belong to D and therefore a and b 
must be in different subsets. In this reduction, E = V, \D\ = 2\E\, and D is 
composed of 2|E| words of size I -I- 1. □ 

Example 2 Suppose we want to color the graph on Fig. 2. We encode this 
instance by setting: E = {p,q,r, s,t,u\ and 

D = {pppp,pppq,ppqr,ppqs,pqps,pqpq,pqqs,pqqt, 
qpps, qppu, qpqr, qpqt, qqpr, qqpu, qqqt, qqqu}. 

Each edge ab on the graph of Fig. 2(a) is labeled by the word Wab- 

Although many other reductions are possible, we believe this one is interesting 
because it combines two useful properties. First, the size of the alphabet is equal 
to \V\. This means that nonapproximability results for Graph-Coloring can 
be transposed to Keyboard. In particular, a recent contribution from Bellare 
et al. [2] implies the following (assuming NPyfcoRP). 
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(a) reduction in theorem 1 



ab fi/ kl 




(b) reduction in theorem 2 



Fig. 2. Example graph 

Corollary 1 Keyboard is not approximable within Ve > 0. 

As a second property, the result also holds in the case where words in D are 
constrained to have the same size. In general, results presented in this paper are 
also valid for the case where the words are constrained to have the same size. 

This NP-hardness result does not tell us whether testing the existence of 
a partition of size two is NP-complete, since testing the two-colorability of a 
graph is a polynomial problem. We provide another reduction using the decision 
version of Graph-Coloring. 

Theorem 2 Asking for the existence of a feasible solution of a given size K in 
an instance o/ K eyboard is NP-complete for any K >2. 

Proof. Let us prove this for K = 2. We use a reduction of the problem of testing 
the existence of a 2^-coloring of a graph, for any M > 1. This reduction has 
the same flavor as the previous one. We use two symbols x and y to define the 
prefixes of size I = |"log 2 (|if| -|- 1)] identifying edges of the graph. The two words 
J.I+M x^'y^ of size I M are first included in D. Hence the first prefix 
x^ is only used to make x and y distinguishable. Then we associate to each 
vertex a G V a, word Va of size M made of previously unused symbols. For each 
edge ab, we include the two words WabVa and WabVb in D, where Wab is a prefix 
identifying edge ab, distinct from x^ . In this way, two-coloring symbols of a word 
Va corresponds to assigning to vertex a a color in the range {0, 1, . . . , 2^ — 1}. 
In this reduction, |A| = 2 M|K|, \D\ = 2 2\E\, and D is made of words of 

equal sizes I M. □ 

Example 3 We consider the graph on Fig. 2 and encode the problem of test- 
ing whether this graph has a coloring of size 4 (M = 2). We define S = 
{x, y, a, b, c, d, e, f, g, h, i, j} and 

D = {xxxxxx, xxxxyy, xxxycd, xxxygh, 

xxyxef, xxyxgh, xxyyij, xxyygh, xyxxij, xyxxef, 

xyxyij, xyxykl, xyyxef, xyyxkl, xyyykl, xyyygh, yxxxab, yxxxcd} 

The reduction is illustrated on Fig. 2(b). 
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Again, we point out that the result holds even in the particular case when words 
have equal lengths. 

3 Keyboards with Contiguous Symbols on Each Key 

In the previous section, we assumed that symbols of the alphabet could be put 
anywhere on the keyboard. In other words, the partition of S is chosen among 
all possible partitions. We now consider a more realistic problem in which the 
alphabet is ordered, and keys of the keyboard are constrained to represent only 
contiguous alphabet symbols. We show that this constrained variant has very 
strong connections with the set cover problem. 

Definition 3 (Contiguous-Keyboard) An instance of Contiguous- 
Keyboard is composed of an ordered alphabet S and a dictionary D C S* . A 
solution of this instance is a partition of E such that 

1. each subset of the partition is composed of consecutive symbols of E, 

2. for any pair x,y e D with x = {xi,X 2 , ■ ■ ■ ,a;p|) and y = (j/i,i/ 2 , ■ • ■ 
either |x| yf |y| or there exists an index i such that Xi and yi are in different 
subsets of the partition. 

The objective function to minimize is the size of the partition. 

We briefly recall the definition of the set cover problem. 

Definition 4 (Set-Cover) An instance of Set-Cover is composed of a 
ground set S and a set E of subsets of S. A solution is a subset of E that 
covers each element of S. The objective function to minimize is the size of this 
subset. 

Theorem 3 Any instance of Contiguous-Keyboard can be encoded as an 
instance of Set-Cover. 

Proof. Let us first remark that finding a partition of E whose subsets are com- 
posed of contiguous elements amounts to selecting separators in{l,2,...,|A| — 
1}. The partition is then defined as follows: for each selected separator i, all 
symbols of rank less or equal to i in A are in a different subset than those with 
rank higher than i. 

To each separator t in {1, 2, ... , lAj — 1}, we associate the set 

Ci = {{w, w} \ v,w £ D A \v\ = |w| A 3j (rank(uj) < i A rank(wj) > i)}, 

that is, the set of unordered word pairs of equal lengths that are made distin- 
guishable by selecting the separator i. The optimization now consists in finding 
the minimal set of subsets in E = {Ci, C 2 , . . . , C] such that all the un- 
ordered word pairs in A = {{u,^} | w, w G D A |u| = |w|} are covered. □ 

Corollary 2 Contiguous-Keyboard is approximable within 1 -I- In [S'] < 1-1- 
21n \D\, where S = {{u, w} \ v,w £ D A \v\ = |w|}. 
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Proof. It is well known that Set-Cover is approximable within Id- In jS”! using 
the greedy covering algorithm [6,11]- The size of the partition in Contiguous- 
Keyboard is one more than the number of separators selected in the covering. 
If we denote by CK (resp. CKqpt) the approximate (resp. optimal) solution 
of Contiguous-Keyboard and by SC (resp. SCqpt) the approximate (resp. 
optimal) solution of Set-Cover, we have SC < (1 -I- In jS'DSCopT, hence 

CK - 1 < (1 -k In |S'|)(CKopt - 1) 

CK < (1 -k In IS'DCKopt - 1 - In I^I -k 1 
CK < (1 -kin IS'DCKopt- 

□ 

So far, it is still not clear whether Contiguous-Keyboard is NP-hard or not. 
We could imagine that some structure available in Contiguous-Keyboard 
could be used by a polynomial algorithm to solve it to optimality. The next 
theorem shows that this is not the case. 

Theorem 4 Any instance of Set-Cover can he encoded as an instance of 
Contiguous-Keyboard. 

Proof. We first remark that the only way to distinguish two consecutive symbols 
of ranks i and t -k 1 is to select separator i. It is then possible to encode a Set- 
Cover problem in a Contiguous-Keyboard problem, by associating a pair 
of words in D to each element of S, and craft them carefully so that they are 
contained in only a certain number of subsets in E = {Ci,C 2 , . • . , First, 

let |T'| = |F1| + 1. Let us consider an element a; of S' and construct a corresponding 
pair of words {u,w} in D. For each i such that x is contained in Ci, we simply 
append the symbol of rank i to v and the symbol of rank z -k 1 to ru. We also 
need to always distinguish words of different pairs. To achieve this, we can make 
the words of different pairs having different lengths by concatenating them with 
different numbers of copies of themselves. We have \D\ = 2|S| and a polynomial 
reduction. □ 

Corollary 3 Contiguous-Keyboard is NP-hard and not approximable 
within clog \D\, for some constant c > 0. 

The inapproximability result comes from [10]. 

Example 4 Let S = {1, 2, 3, 4} and if = {{1, 2}, {2, 3}, {1, 3, 4}}. We translate 
this Set-Cover problem into a Contiguous-Keyboard problem by letting 
E = (a, b, c, d) and 

D = {ac, bd, abab, bcbc, bdbdbd, cccccc, c, d}. 

In this example, the pair {ac,bd} represents element 1 G S', found in the first 
and third subsets. The word pair is therefore separated by the separator 1 between 
a and b and by the separator 3 between c and d. The distinction between words 
corresponding to different elements of S is ensured by the variation in length. 

A variant of this reduction in which the words of D are constrained to have the 
same length could use a system of prefixes, as in the two previous proofs. 
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4 Ambiguous Keyboards 

When dealing with large dictionaries, it is likely that an optimal partition in 
both of the preceding problems would be quite large, and maybe even of the size 
of the alphabet itself. It is therefore interesting to consider the problem of an 
ambiguous keyboard, in which the number of keys is constrained to be at most 
K , and some well-defined measure of ambiguity between words is minimized. 

Definition 5 (Ambiguity) A partition of S defines a eonf us ability relation 
between words in D: 

R = {{w, w} I u, w G D A |w| = licl A Vt : Vi and Wi are in the same subsets} 

R is an equivalence relation, hence it partitions D in a set C of equivalence 
classes. From this observation, we define 

— the number of ambiguous pairs P = X^ceC ( 2 ')’ 

— the number of nonambiguous pairs P = [S'! — P. 

— the ambiguity A = P/|D|. 

— the nonambiguity A = P /\D\. 

The motivation for using this ambiguity measure is the use of a selection sys- 
tem. When a user types an ambiguous word, the selection system allows him to 
select the word he actually wishes to enter among the list of words in the same 
equivalence class. If the first word in the list is the correct one, no further key 
needs to be pressed. One click on the “scroll down” key allows him to select the 
second word. In general, i — 1 clicks are necessary for selecting the ith word in 
the list. Hence the average number of clicks for the selection of a word in an 
equivalence class c is ~ A)/\c\ = {f^/\c\, and the overall average number 

of clicks needed per word is A = P/\D\. This naturally holds only under the 
assumption of uniform probability distribution of words in D. 



ace ad be 




Fig. 3. Graph of a confusability relation between words 



Example 5 Fig. 3 shows the graph of a confusability relation between words 
obtained when partitioning the alphabet S = {a, 6, c, d, e, /} in subsets {a,b}, 
{c,d} and {e, /}. We have: 

D = {ace, acf, ade, bdf, ad, ac, be, be, af} 

C = {{ace, acf, ade, bdf}, {ad, ac, be}, {be, af}} 
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It is easy to check that A = P/\D\ = 10/9 is also the average number of clicks 
per word in the selection system. 

For simplicity, we will concentrate on the fixed-parameter version of 
Contiguous-Keyboard only. 

Definition 6 (K-Contiguous-Keyboard) An instance of K-Contiguous- 
Keyboard is an instance of Contiguous-Keyboard enriched with an inte- 
ger K. A solution of this instance is a partition of E of size K satisfying the 
constraints in Contiguous-Keyboard. The problem is parameterized by the 
(non) ambiguity measure that is to be minimized (maximized). 

To indicate which ambiguity measure is used, we append one of the symbol A or 
A in parentheses. Although the two problems have the same optimal solutions, 
an approximation algorithm for one problem is not necessarily an approximation 
algorithm for the other, which is why we distinguish the two. 

We now show that this fixed-parameter version of Contiguous-Keyboard 
corresponds to the fixed-parameter version of Set-Cover. 

Definition 7 (Max-Coverage) An instance of the Max-Coverage problem 
is an instance o/ S et-Cover enriched with an integer K. A solution is a subset 
of E of size K that covers some element of S. The objective function to maximize 
is the number of elements covered. 

Theorem 5 Any instance of K-Contiguous-Keyboard/A/ can be encoded 
as an instance of Max-Coverage and any instance of Max-Coverage can 
be encoded as an instance o/ K-C ontiguous-Keyboard/A/. 

Proof. The proofs are the same as those of theorems 3 and 4. We just have to 
remark that the parameter is not the same: K-Contiguous-Keyboard(A) for 
a certain K reduces to a Max-Coverage problem with parameter K — 1 . □ 

Corollary 4 (Approximation) K-Contiguous-Keyboard/A/ is approx- 
imable within a factor 1 — 1/e. 

Proof. From the approximation yielded for Max-Coverage by the greedy al- 
gorithm, proved in [5,7]. □ 

The developments above also hold in the case where the probability distribution 
of the words in D is not uniform. Let us assume that a probability is assigned 
to each word v in D, with '^^^dPv = 1- The average number of clicks per 
word can be computed easily if we assume that the selection system presents the 
words in decreasing probability order in each equivalence class of C . We obtain 
the following generalized objective functions. 

Definition 8 (Weighted ambiguity) 

^ = EE Py ■ rankcfv) = E E nT\n{py,py,) 

cGC V^C cGC {v,lt!}Cc 

A = j Py ■ ranko (v) ] — A 

\vdS J 
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The function rankc sorts the words in a set c: the most probable word has rank 
0, the second most probable has rank 1, and so on. There is no need to normalize 
here, and A is the average number of clicks per word. 



abed 


efgh 




abed ef 


gh 




ab 


cd 


ijkl 


1 mno 




ijkl 


1 mno 1 


pqrs 


ijk 


a 


pqrs 


tuvwxyz 


t 


uvwxyz 


pqr 


0 



ef 



0 



gh 



I uvwxyz 



(a) K = 6, A = 
63/885 



(h) K = 8, A = 
22/885 



(c) K = 12, A ^ 
3/885 



Fig. 4. Some optimal solutions of K-Contiguous-Keyboard with E = {a, b, . . . , z} 
and a dictionary of 885 frequent words in english 



Example 6 Let us assume that the words ace, acf, ade and bdf are in the same 
equivalence class c of C, and that Pace > Pacf > Pade > Pbdf ■ The average number 
of clicks to select one of the words is (pace-0+PacfA+pade-‘2+Pbdf3)/{J2vecPy) = 
(J2{v,w}Cc^HPv,Pw))/(J2v(^cPv)- 

By assigning the weight min(p„,pi„) to each edge {u,w} € S, we can see 
that the weighted version of K-Contiguous-Keyboard reduces to a weighted 
maximum coverage problem, hence the corresponding variant K-Contiguous- 
Keyboard(A) remains approximable within 1 — 1/e, as in the unweighted case 
[5,7]. 

5 Examples of Optimized Keyboards 

We now present some examples of optimal keyboards for the latin alphabet 
and a dictionary of 885 frequent english words. This file was obtained from the 
Letter-by-Letter Word Games FAQ website. It has been filtered by elimination 
of uppercase letters. 

We concentrate on keyboards with contiguous symbols on each key, more 
precisely on optimal solutions of K-Contiguous-Keyboard(A), i.e. keyboards 
that minimize the number of ambiguous word pairs. Exhaustive searching is 
affordable here: we have at most ('^0^) = (00) < ( 12 ) ^ 5,200,300 different 
partitions. For each possible partition of size K an algorithm for computing the 
ambiguity measure A is run. We show that this can be done in time linear in 

\D\. 

Theorem 6 Given a dictionary D of words made of symbols in E and a par- 
tition of E, it is possible to check the feasibility condition in Keyboard or the 
ambiguity of the partition in time 0{\D\), provided that the maximum length of 
a word in D is constant. 
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Proof. To achieve this complexity in the worst case, we can store the dictionary 
D in a, decision tree and merge the symbols in breadth-first order. In practice, 
the algorithm can advantageously be implemented using a hash table: for each 
word of D, the existence of a previously seen word with the same subset sequence 
can be checked in constant average time. □ 

Optimal solutions are shown on Fig. 4. It is interesting to compare Fig. 4(b) 
with the standard layout of Fig. 1. We computed the ambiguity A of the latter 
and obtained A = 57/885. The individual keys for the letters l,o,s and t are 
noticeable on Fig 4(c). 

6 Conclusion 

We proposed an analysis of an original keyboard design problem, formulated 
as a combinatorial optimization. This is the first theoretical approach of such a 
problem, and realistic assumptions were made that certainly make this approach 
directly useful in practice. As a future research, it would be interesting to give 
other approximability or nonapproximability results for ambiguous keyboards 
with alternative ambiguity measures or selection systems. It is also likely that 
this problem appears in other contexts, such as sequence analysis. 
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Abstract. We study the metric properties of finite subsets of Li. The 
analysis of such metrics is central to a number of important algorithmic 
problems involving the cut structure of weighted graphs, including the 
Sparsest Cut Problem, one of the most compelling open problems in the 
field of approximation. Additionally, many open questions in geometric 
non-linear functional analysis involve the properties of finite subsets of 
Li. 

We present some new observations concerning the relation of Li to di- 
mension, topology, and Euclidean distortion. We show that every n-point 
subset of Li embeds into L 2 with average distortion 0(\/logn), yielding 
the first evidence that the conjectured worst-case bound of 0{^y\og n) 
is valid. We also address the issue of dimension reduction in Lp for 
p £ (1,2). We resolve a question left open in [1] about the impossibility 
of linear dimension reduction in the above cases, and we show that the 
example of [2,3] cannot be used to prove a lower bound for the non-linear 
case. This is accomplished by exhibiting constant-distortion embeddings 
of snowflaked planar metrics into Euclidean space. 



1 Introduction 

This paper is devoted to the analysis of metric properties of finite subsets of Li. 
Such metrics occur in many important algorithmic contexts, and their analysis 
is key to progress on some fundamental problems. For instance, an O(logn)- 
approximate max-flow/min-cut theorem proved elusive for many years until, in 
[4,5] , it was shown to follow from a theorem of Bourgain stating that every metric 
on n points embeds into Li with distortion O(logn). 

The importance of L\ metrics has given rise to many problems and conjec- 
tures that have attracted a lot of attention in recent years. Four basic problems 
of this type are as follows. 
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I. Is there an L\ analog of the Johnson-Lindenstrauss dimension reduction 
lemma [6]? 

II. Are all n-point subsets of Li O (-^/log n) -embeddable into Hilbert space? 

III. Are all squared -^2 metrics 0(l)-embeddable into Li? 

IV. Are all planar graphs 0(l)-embeddable into Li? 

(We recall that a squared -^2 metric is a space {X,d) for which (V, embeds 
isometrically in a Hilbert space.) 

Each of these questions has been asked many times before; we refer to [7,8,9, 
10], in particular. Despite an immense amount of interest and effort, the metric 
properties of Li have proved quite elusive; hence the name “The mysterious Li” 
appearing in a survey of Linial at the ICM in 2002 [9]. In this paper, we attempt 
to offer new insights into the above problems and touch on some relationships 
between them. 



1.1 Results and Techniques 

Euclidean distortion. Our first result addresses problem (II) stated above. 
We show that the answer to this question is positive on average, in the following 
sense. 



Theorem 1. For every /i, ...,/„ G Li there is a linear operator T : Li — >■ L 2 
such that 



\\T{h)-T{f,) 

II/. -/.Ill 



/8 log n ’ 



1 < f < j < n. 



1 Y- /iin/.)-n/i)ii2V^'/io 
II /- /.111 ) - 



In other words, for any n-point subset in Li, there exists a map into L 2 such that 
distances are contracted by at most 0{^ylog n) and the average expansion is 0(1). 
This yields the first positive evidence that the conjectured worst-case bound of 
0(-\/logn) holds. We remark that a different notion of average embedding was 
recently studied by Rabinovich [11]; there, one tries to embed (planar) metrics 
into the line such that the average distance does not change too much. 

The exponent 1 /2 above has no significance, and we can actually obtain the 
same result for any power 1— £, £ > 0 (we refer to Section 2 for details). The proof 
of Theorem 1 follows from the following probabilistic lemma, which is implicit 
in [12]. We believe that this result is of independent interest. 



Lemma 1. There exists a distribution over linear mappings T : L. — >■ L 2 such 
that for every x G Li\ {0} the random variable has density ^ * ■ 

In contrast to Theorem 1, we show that problem (II) cannot be resolved posi- 
tively using linear mappings. Specifically, we show that there are arbitrarily large 
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n-point subsets of L\ such that any linear embedding of them into L 2 incurs dis- 
tortion I7(i/n). As a corollary we settle the problem left open by Charikar and 
Sahai in [1], whether Zm ear dimension reduction is possible in Lp, p ^ {1; 2}. The 
case p = 1 was proved in [1] via linear programming techniques, and it seems 
impossible to generalize their lower bound to arbitrary Lp. We show that there 
are arbitrarily large n-point subsets X Q Lp (namely, the same point set used 
in [1] to handle the case p = 1), such that any linear embedding of X into ip in- 
curs distortion 17 , thus linear dimension reduction is impossible 

in any Lp, p ^ 2. Additionally, we show that there are arbitrarily large n-point 
subsets X Q Li such any linear embedding of X into any d-dimensional normed 
space incurs distortion 17 . This generalizes the Charikar-Sahai result to 

arbitrary low dimensional norms. 

Dimension reduction. In [2], and soon after in [3], it was shown that if the 
Newman-Rabinovich diamond graph on n vertices a-embeds into if then d > 
n^(i/“ \ The proof in [2] is based on a linear programming argument, while the 
proof in [3] uses a geometric argument which reduces the problem to bounding 
from below the distortion required to embed the diamond graph in ip, 1 < p < 2. 
These results settle the long standing open problem of whether there is an Li 
analog of the Johnson-Lindenstrauss dimension reduction lemma [6]. (In other 
words, they show that the answer to problem (I) above is No.). In Section 4, 
we show that the method of proof in [3] can be used to provide an even more 
striking counter example to this problem. 

A metric space X is called doubling with constant C if every ball in X can be 
covered by C balls of half the radius. Doubling metrics with bounded doubling 
constants are widely viewed as low dimensional (see [13,14] for some practical 
and theoretical applications of this viewpoint) . On the other hand, the doubling 
constant of the diamond graphs is L2{y/n) (where n is the number of points). 
Based on a fractal construction due to Laakso [15] and the method developed 
in [3], we prove the following theorem, which shows a strong lower bound on the 
dimension required to represent uniformly doubling subsets of Li. 

Theorem 2. There are arbitrarily large n-point subsets X C Li which are dou- 
bling with constant 6 but such that every a- embedding of X into if requires 
d > 

In [16,13] it was asked whether any subset of £2 which is doubling well-embeds 
into if (with bounds on the distortion and the dimension that depend only on 
the doubling constant). In [13], it was shown that a similar property cannot hold 
for ii. Our lower bound exponentially strengthens that result. 

Planar metrics. Our final result addresses problems (III) and (IV). Our moti- 
vation was an attempt to generalize the argument in [3] to prove that dimension 
reduction is impossible in Lp for any 1 < p < 2. A natural approach to this 
problem is to consider the point set used in [2,3] (namely, a natural realization 
of the diamond graph, G, in L\) with the metric induced by the Lp norm instead 
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of the L\ norm. This is easily seen to amount to proving lower bounds on the 
distortion required to embed the metric space (G, in fp. Unfortunately, this 
approach cannot work since we show that, for any planar metric (X, d) and any 
0 < £ < 1, the metric space {X, embeds in Hilbert space with distortion 
0(l/\/£), and then using results of Johnson and Lindenstrauss [6], and Figiel, 
Lindenstrauss and Milman [17], we conclude that this metric can be 
embedded in 1^, where h = O(logn). The proof of this interesting fact is a 
straightforward application of Assouad’s classical embedding theorem [18] and 
Rao’s embedding method [19]. The O upper bound is shown to be tight 

for every value 0 < e < 1. We note that the case £ = 1/2 has been previously 
observed by A. Gupta in his (unpublished) thesis. 



2 Average Distortion Euclidean Embedding of Subsets 
of 

The heart of our argument is the following lemma which is implicit in [12], and 
which seems to be of independent interest. 

Lemma 2. For every 0 < p < 2 there is a probability space {fi,P) such that 
for every lo € f2 there is a linear operator : Lp ^ L2 such that for every 
X G Lp \ {0} the random variable X = satisfies for every a G K, 

gg-oX _ g-a’’^ ^ particular, for p = 1 the density of X is ^ ■ 

Proof. Consider the following three sequences of random variables, {Yj}j>i, 

variable is independent of the others. For 
each j > 1, Yj is uniformly distributed on [0,1], gj is a standard Gaussian 
and 9j is an exponential random variable, i.e. for A > 0, P{0j > A) = e~^. Set 
Pj = 6*1 H — ' + By Proposition 1.5. in [12], there is a constant C = C{p) such 
that if we define for f G Lp 

r(/) = ci: //?/«). 

i>i 



then 

Assume that the random variables {Yj}j>i and {Pj}j>i are defined on a 
probability space (f2,P) and that {gj}j>i are defined on a probability space 
{O' ,P'), in which case we use the notation V{f) = V{f;uj,uj'). Define for w G 17 
a linear operator : Lp ^ L2{f2',P') by Ti^(/) = U(/;w,-). Since for every 
fixed w G 17 the random variable V{f;u>, •) is Gaussian with variance ||7b(/)||2i 
for every a G K, Ep/e“^^^’“’'^ = ll'D-l/llb. Taking expectation with respect 
to P we find that, Epe~“ = g-“*’ll/llp^ This implies the required identity. 

The explicit distribution in the case p = 1 follows from the fact that the inverse 



Laplace transform oi x e is p 1— >■ 



e-i/(4y) 

2 -\/ 



(see for example [20]). 




Metric Structures in Li: Dimension, Snowflakes, and Average Distortion 



405 



Theorem 3. For every /i, •■•,/« G there is a linear operator T : Li ^ L 2 
such that: 






U - /: 



> 



jWi 



a/8 log n ’ 



^ ^ i < j ^ n, and 
1/2 



II /- /.111 ) - 




( \\TUh) 
V II/. 



y^(/.-)l|2 

/till 







Choosing a 



1 

4e^ 



the above upper bound becomes e 



l/(4e"). 



Consider the set 



n 

l<i<j<n 



||T^(//-T^(/j)||2 > 1 1 ^ ^ 

II/. -/illi “ V 81 ogn j “ 



By the union bound, P{A) > |, so that 



1 

P{A) 



E 




E 

l<i<j<n 



V ll/i-//li J 



< 2EX^/2 




e-i/(4o;^) 

^ dx < 10. 



It follows that there exists a; G A for which the operator T = has the desired 
properties. □ 



Remark 1. There is nothing special about the choice of the the power 1/2 in 
Corollary 3. When p = 1, EAT = 00 but EX^”® < 00 for every 0 < e < 1, so we 
may write the above average with the power 1 — £ replacing the exponent 1/2. 
Obvious generalizations of Corollary 3 hold true for every 1 < p < 2, in which 
case the average distortion is of order C'(p) (log (and the power can be 

taken to be 1). 



3 The Impossibility of Linear Dimension Reduction in 

Lp, p ^ 2 

The above method cannot yield a O (-^/log n) bound on the Euclidean distortion 
of n-point subsets of Li. In fact, there are arbitrarily large n-point subsets of Ti 

on which any linear embedding into L 2 incurs distortion at least ■ This 

follows from the following simple lemma: 
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Lemma 3. For every 1 < p < oo there are arbitrarily large n-point subsets of Lp 
on which any linear embedding into L 2 incurs distortion at least 

Proof. Let wi, . . . , W2'= be the rows of the 2^ x 2^ Walsh matrix. Write Wi = 
^ij^j where ei, . . . ,62^ are the standard unit vectors in . Consider the 

set A = {0} U U C ip. Let T : £p — >• L2 be any linear operator 

which is non contracting and L-Lipschitz on A. Assume first of all that 1 < p < 2. 
Then: 



2'=(i+"/") = Eii^*iip^E 11^^*112 = E 



2 = 1 2=1 
r) fc r> 



2=1 



^w,jT{ej) 



i=i 



= ^^{w„w,) (T(e,),T(e,)) = 2>^^\\T{e,)\\l < 

i=l j=l j=l 



which implies that L > 

same reasoning, with the inequalities reversed. 



i/p-1/2 



. When p > 2 apply the 

□ 



We remark that the above point set was also used by Charikar and Sahai [1] to 
give a lower bound on linear dimension reduction in Li . Their proof used a linear 
programming argument, which doesn’t seem to be generalizable to the the case 
of Lp, p > 1. Lemma 3 formally implies their result (with a significantly simpler 
proof), and in fact proves the impossibility of linear dimension reduction in any 
Lp, p yf 2. Indeed, if there were a linear operator which embeds A into £p with 

distortion D then it would also be a D • c?|i/p“ 1/2| embedding into K follows 
/ M|-l\ b/P-l/ 2 | 

that D > ( ^ j . Similarly, since by John’s theorem (see e.g. [21]) any 

d-dimensional normed space is \fd equivalent to Hilbert space, we deduce that 
there are arbitrarily large n-point subsets of L \ , any linear embedding of which 

into any d-dimensional normed space incurs distortion at least ■ 



4 An Inherently High-Dimensional Doubling Metric 
in 

This section is devoted to the proof of Theorem 2. 

Proof (of Theorem 2). Consider the Laakso graphs, {Gi}“g, which are defined 
as follows. Go is the graph on two vertices with one edge. To construct Gj, take 
six copies of Gi-i and scale their metric by a factor of j. We glue four of them 
cyclicly by identifying pairs of endpoints, and attach at two opposite gluing 
points the remaining two copies. See Figure 1 below. 

As shown in [15], the graphs {Gi}“g are uniformly doubling (see also [16], 
for a simple argument showing they are doubling with constant 6). Moreover, 
since the Gfs are series parallel graphs, they embed uniformly in Li (see [22]). 
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We will show below that any embedding of in Lp, 1 < p < 2 incurs 
distortion at least We then conclude as in [3] by observing that £f 

is 3-isomorphic to when p = 1 -|- , so that if Gi embeds with distortion a 

in if then a > Y^^olog^- implies the required result since i ~ log \ Gi\. 

The proof of the lower bound for the distortion required to embed Gi into 
Lp is by induction on i. We shall prove by induction that whenever f : Gi ^ Lp 
is non-contracting then there exist two adjacent vertices u,v G Gi such that 

\\f{u)—f{v)\\p > dci (u, v)^l+ (observe that for u, u G Gi_i, d,Gi_i{u,v) = 
doiiujv)). For i = 0 there is nothing to prove. For i > I, since Gi contains an 
isometric copy of Gi_i, there are u,v € Gi corresponding to two adjacent vertices 

in Gi-i such that ||/(u) — f{v)\\p > dciiu, u)^l -I- — 1). Let a,b be the 

two midpoints between u and w in Gi. By Lemma 2.1 in [3], 

< Wfiu) - f{a)\\l + \\f{a) - + ll/(^') - mwl + \\m - fin)\\l- 

Hence: 

max{||/(«) - f{a)\\l, ||/(a) - /(u)||^, \\f{v) - /(5)g, ||/(6) - f{u)\\l} 

> i 1)^ dG,{u,v)'^ + ^’-j^dG,(a,bf 

= ^ (l + 

= max{dG, (m, a)^, dci (a, v)'^, dGi (v, bf, dGi {b, u)^}. 

□ 

We end this section by observing that the above approach also gives a lower 
bound on the dimension required to embed expanders in iao- 

Proposition 1. Let G be an n-point constant degree expander which embeds in 
with distortion at most a. Then d > 

Proof. By Matousek’s lower bound for the distortion required to embed ex- 
panders in ip [23], any embedding of G into ip incurs distortion Q • Since 

if^ is 0(l)-equivalent to we deduce that a > f2 □ 

We can also obtain a lower bound on the dimension required to embed the 
Hamming cube {0, 1}^ into iao- Our proof uses a simple concentration argument. 
An analogous concentration argument yields an alternative proof of Proposi- 
tion 1. 
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Go 



Gi 



G2 






Proposition 2. Assume that {0,1}^ embeds into £'L with distortion a. Then 
d > 

Proof. Let f = {fi, . . . , fd) ■ {0, 1}^ — >■ be a contraction such that for every 
u,v G {0, 1}"^, \\f{u) — f{v)\\oo > ^d{u,v) (where d{-,-) denotes the Hamming 
metric). Denote by P the uniform probability measure on {0,1}^. Since for 
every 1 < i < k, fi is 1-Lipschitz, the standard isoperimetric inequality on the 
hypercube implies that P (\fi{u) — E/j| > fc/(4a)) < On the other 

hand, if u, w G {0, 1}^ are such that d{u,v) = k then there exist 1 < z < d for 
which \f,{u) - fi{v)\ > k/a, implying that max{|/i(u) - E/*|, \fi{v) - E/i|} > 
A:/(4a). By the union bound it follows that ^ > 1, as required. □ 

5 Snowflake Versions of Planar Metrics 

The problem of whether there is an analog of the Johnson-Lindenstrauss dimen- 
sion reduction lemma in Lp, 1 < p < 2, is an interesting one which remains open. 
In view of the above proof and the proof in [3], a natural point set which is a 
candidate to demonstrate the impossibility of dimension reduction in Lp is the 
realization of the diamond graph in £i which appears in [2], equipped with the 
£p metric. Since this point set consists of vectors whose coordinates are either 0 
or 1 (i.e. subsets of the cube), this amounts to considering the diamond graph 
with its metric raised to the power Unfortunately, this approach cannot work; 
we show below that any planar graph whose metric is raised to the power I — e 
has Euclidean distortion 0{l/^/e). 

Given a metric space {X, d) and e > 0, the metric space {X, d^“®) is known in 
geometric analysis (see e.g. [24]) as the 1— e snowflake version of {X, d). Assouad’s 
classical theorem [18] states that any snowflake version of a doubling metric space 
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is bi-Lipschitz equivalent to a subset of some finite dimensional Euclidean space. 
A quantitative version of this result (with bounds on the distortion and the 
dimension) was obtained in [13]. The following theorem is proved by combining 
embedding techniques of Rao [19] and Assouad [18]. A similar analysis is also 
used in [13]. In what follows we call a metric A'^.-excluded if it is the metric on 
a subset of a weighted graph which does not admit a Kr minor. In particular, 
planar metrics are all ATs-excluded. 

Theorem 4. For any r G N there exists a constant C{r) such that for every 
Q < e < \, a 1 — e snowflake version of a Kr-excluded metric embeds into li with 
distortion at most C(r)j^. 

Our argument is based on the following lemma, the proof of which is con- 
tained in [19]. 

Lemma 4. For every r G N there is a constant S = S{r) such that for every p > 
0 and every Kr-excluded metric (A, d) there exists a finitely supported probability 
distribution p, on partitions of X with the following properties: 

1. For every P G supp(/r), and for every C G P, diam(C) < p. 

2. For every a; G A, A \ C) > 5p. 

Observe that the sum under the expectation in (2) above actually consists 
of only one summand. 

Proof (Proof of Theorem 4)- Let A be a A^-excluded metric. For each n G Z, we 
define a map </>„ as follows. Let /i„ be the probability distribution on partitions 
of A from Lemma 4 with p = Fix a partition P G supp(/x„). For any 

cr G { — 1,-|-1}I'^I, consider cr to be indexed by C G P so that ac has the obvious 
meaning. Following Rao [19], define 

<pp{x)= 0 ■ d{x,X\C), 

o.g{_p+i}iPi * ceP 

and write 4>n = 0pesupp(At„) \/ Fn{P) 4>p (here the symbol © refers to the con- 
catenation operator). 

Now, following Assouad [18], let {ej}jgz be an orthonormal basis of and 
set 

<F{x) = ^ © e„ 



Claim. For every n G Z, and x^y G A, we have ||^„(a;) — 4>n{y)\\2 < 
2 • min {d(a;, y), 2”/^^“®^ }. Additionally, if d{x,y) > then \\(j>n{,x) — 

0 „( 2/)||2 
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Proof. For any partition P G supp(^„), let C^, Cy be the clusters of P containing 
X and y, respectively. Note that since for every C G P, diam(C') < when 

d{x,y) > we have yf Cy. In this case, 

I |(/)p(a;) - (t>p{y)\\l = \acj{x, X\C^)~ <rcyd{y, X \ Cy)\^ 

^ d{x,X\C,f + d{y,X\Cyf 
2 



It follows that 

\\(j)n{x) - <f)n{y)\\l = ^,^n\\(l^pix) - 4>p{y)\\l 

> E^^d{x,X\C,f +E^^d{y,X\Cyr > . 

On the other hand, for every x,y G X, since d{x,X \ Cx),d{y,X \ Cy) < 
2 "/(i-e)j we have that \\4>p{x) — (j)p{y )\\2 < 2 • min {d(a;, j/), hence 

\\(fn{x) - (fn{y)\\2 < 2 ■ min □ 

To finish the analysis, let us fix x, j/ € AT and let m be such that d{x, G 

(2™, 2™+^] . In this case, 



mx)-<l^{y)\\l = ^2-2-/(i- 0 ||^„(^) _ ^„(y)||2 

< 4 ^ 2^” +4d(x,j/)2 E t2 — 2ne! (1 — e) 



n<m 



n>m 



22m+l 



2~ 2me/(l— e) 

Mx,y)^l_2_2,/(i_,) 



= 0(I/£)-d(x,j/)2(i--) 



On the other hand, 

- m\\2 > 2 — - <f>^{y)h > S2^ > ^-d(x,y)^-C 
The proof is complete. □ 



Remark 2. The O {\/ ^) upper bound in Theorem 4 is tight. In fact, for i « I/e, 
the I — e snowflake version of the Laakso graph Gi (presented in Section 4) 
has Euclidean distortion fi{l/^/e). To see this, let / : Gi — >■ £2 be any non- 
contracting embedding of (Gi, d]P.^) into ^ 2 - For j <i denote by Kj the Lipschitz 
constant of the restriction of / to {Gj,d]P.^) (as before, we think of Gj as a subset 
of Gi). Clearly Kq = 1, and the same reasoning as in the proof of Theorem 2 

shows that for j > 1, Kj > | . This implies that Kf > \ + ^ + . . .+ ^ = 

Q{l/e), as required. 
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Abstract. We consider the problem of computing a Nash equilibrium in 
multiple-player games. It is known that there exist games, in which all the 
equilibria have irrational entries in their probability distributions [19]. 
This suggests that either we should look for symbolic representations of 
equilibria or we should focus on computing approximate equilibria. We 
show that every finite game has an equilibrium such that all the entries 
in the probability distributions are algebraic numbers and hence can be 
finitely represented. We also propose an algorithm which computes an 
approximate equilibrium in the following sense: the strategies output by 
the algorithm are close with respect to Z^o-norm to those of an exact 
Nash equilibrium and also the players have only a negligible incentive 
to deviate to another strategy. The running time of the algorithm is 
exponential in the number of strategies and polynomial in the digits of 
accuracy. We obtain similar results for approximating market equilibria 
in the neoclassical exchange model under certain assumptions. 



1 Introduction 

Noncooperative game theory has been extensively used for modeling and analyz- 
ing situations of strategic interactions. One of the dominant solution concepts in 
noncooperative games is that of a Nash equilibrium [19]. Briefly, a Nash equilib- 
rium of a game is a situation in which no agent has an incentive to unilaterally 
deviate from her current strategy. A nice property of this concept is the well 
known fact that every game has at least one such equilibrium [19]. 

In this paper we consider the problem of computing a Nash equilibrium in 
finite games. The proof given by Nash for the existence of equilibria is based 
on Brouwer’s fixed point theorem and is nonconstructive. A natural algorithmic 
question is whether a Nash equilibrium can be computed efficiently. Even for a 
2-player game there is still no polynomial time algorithm. The running time of 
the known algorithms (see among others [12,13,14,15,16]) is either exponential 
or has not been determined yet (and is believed to be exponential). For m- 
person games, m > 2, the problem seems to be even more difficult [18]. Recently 
it has also been shown that finding equilibria with certain natural properties 
(e.g. maximizing payoff) is NP-hard [4,8]. The complexity of finding a single 
equilibrium has been addressed as one of the current challenges in computational 
complexity [20]. 
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An issue related to the complexity of the problem is that even for 3-player 
games, there exist examples [19] in which the payoff data are rational numbers 
but all the Nash equilibria have irrational entries. Hence it is still not clear 
whether an equilibrium can be finitely represented on a Turing machine. 

The problems mentioned above suggest two potential directions for research. 
The first one (perhaps more interesting from a theoretical point of view) is to 
see whether there exist alternative symbolic representations of Nash equilibria. 
Symbolic representations of numbers have been used in many areas of mathe- 
matics such as algebra or algebraic geometry as well as in algorithmic problems 
involving symbolic computations. A second, more practical goal, is to look for 
approximate equilibria. An approximate equilibrium is usually defined in the lit- 
erature as a set of strategies such that, either no player can increase her payoff 
by a nonnegligible amount if she deviates to another strategy, or the strategies, 
when seen as probability vectors, are close with respect to some norm to an 
exact Nash equilibrium. 

We will address both objectives by using the observation that Nash equilibria 
are essentially the roots of a single polynomial equation. In particular we show 
that every game has at least one Nash equilibrium for which all the entries are 
algebraic numbers, hence it can be finitely represented. The current bounds for 
the size of the representation are exponential. We also use results from the exis- 
tential theory of reals and propose an algorithm for computing an approximate 
equilibrium in time poly(log l/e,L,m"). Here m is the number of players, n is 
the total number of available strategies, e is the degree of approximation and L is 
the maximum bit size of the payoff data. We show that for the case of two players 
we can compute an exact Nash equilibrium in time 2*^*-”^. This is yet another 
exponential algorithm for computing an equilibrium in 2-person games. We also 
note that similar algorithms can be obtained for computing market equilibria 
under certain assumptions. 



1.1 Related Work 

Recent algorithms for approximate equilibria but only for special classes of games 
have been obtained in [10,11]. The fact that Nash equilibria are fixed points of 
a certain map [19] gives rise to many algorithmic approaches that are based 
on Scarf’s algorithm [22], which is a general algorithm for approximating fixed 
points. The worst case complexity of this algorithm is exponential in both the 
total number of strategies and the digits of accuracy [9]. A recent algorithm for 
approximate equilibria in m-player games with a provable upper bound on the 
running time is that of [17]. The running time is subexponential in the number of 
strategies and exponential in the accuracy parameter and the number of players. 
It is better than ours for games with a small number of players. Our algorithm 
is better in terms of the dependence on the digits of accuracy and for games 
with relatively small total number of strategies. Our result is also stronger in 
the sense that not only players have very small incentive to deviate from the 
approximate equilibrium, but also the set of strategies which are output are 
exponentially close to some exact Nash equilibrium. This is not ensured by the 
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algorithms of [22] and [17]. More information on algorithmic approaches can be 
found in the surveys [18,25]. 

The algebraic characterization of Nash equilibria as the set of solutions to a 
system of polynomial inequalities has been used before. In [24] , algebraic tech- 
niques are presented for counting the number of completely mixed equilibria. 
In [5] it is shown that every real algebraic variety is isomorphic to the set of 
completely mixed Nash equilibria of some three-person game. However repre- 
sentation and complexity issues are not addressed there. 

2 Notation and Definitions 

2.1 Nash Equilibria 

Consider a game with m players. Suppose that the number of available (pure) 
strategies for player i is n^. Let no = maxn^. An m-dimensional payoff matrix 
A* is associated with each player. If players l,---,m play the pure strategies 
respectively, player i receives a payoff equal to A*(ji, • • • , j^). For 
simplicity we assume that the entries of the matrices are integers, at most L bits 
long and H = 2^ is their maximum absolute value. 

A mixed strategy for player i is a probability distribution over the set of 
her pure strategies and will be represented by a vector Xi = {xn,Xi 2 , • • • , Xi^m), 
where Xij > 0 and '^Xij = 1. Here Xtj is the probability that the player will 
choose her jth pure strategy. The support of Xi {Supp{xi)) is the set {j : Xij > 0}. 
We will denote by Si the strategy space of player i, i.e., the (n^ — l)-dimensional 
unit simplex. For an ni-tuple of mixed strategies x = (xi, • • • , Xm) G 5i x • • • x5m, 
the expected payoff to the ith player is: 

ni rim 

P {x) = 'y ] • • • 'y [ A [ji, ■ ■ ■ ,jm)xij-^ ■ ■ ■ Xmjm (1) 

il=l im = l 

Following standard notation, for a tuple of mixed strategies x = (xi, • • • , Xm), 
we will denote by x“* the set of strategies: {xj : j yf i}. We will also denote by 
(x“*, x') the tuple (xi, • • • , Xi_i, x', Xj+i, • • • , Xm), be., the ith player switches to 
the strategy x' while all other players keep playing the same strategy as in x. 

The notion of a Nash equilibrium [19] is formulated as follows: 

Definition 1. A tuple of strategies x = (xi, ■ ■ ■ , Xm) G 5i x • • • x Sm is a 
Nash equilibrium if for every player i and for every mixed strategy x' G Si, 
P'-{x~\x'i) < P\x). 

The definition states that x is a Nash equilibrium if no player has an incentive 
to unilaterally defect to another strategy. It is easily seen that it is enough to 
consider only deviations to pure strategies. For a player z, let si denote her 
jth pure strategy. Then an equivalent definition is the following: x is a Nash 
equilibrium if for any player i and any pure strategy of player z, s^: P*(x“*, sj) < 
P*(x). 

Similarly we can formalize the notion of an e-Nash equilibrium (or simply 
e-equilibrium), in which players have only a small incentive to deviate: 
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Definition 2. For e > 0, a tuple of strategies x = (xi, ■ ■ ■ , Xm) is an e-Nash 

equilibrium if for every player i and for every pure strategy si, P^{x~^ , si) < 

P^{x) + e. 

Another notion of approximation is that of e-closeness: 

Definition 3. A point x = (a;i, • • • ,Xm) € x ■ ■ ■ x Sm is e-close to a point 

y G Si X ■ ■ ■ X Sm if \\xi - yiWoo < e forall i = 1, ...,m 

Note that an e-equilibrium is not necessarily close to a real Nash equilibrium. 



2.2 Algebra 

We give some definitions of basic algebraic concepts that we are going to use in 
the later sections. For a more detailed exposition we refer the reader to [3]. 

Definition 4. A real number a is an algebraic number if there exists a univari- 
ate polynomial P with integer coefficients such that P{a) = 0. 



Definition 5. An ordered field R is a real closed field if 

1. every positive element x G R is a square (i.e. x = y'^ for some y G R). 

2. every univariate polynomial of odd degree with coefficients in R has a root 
in R. 

Obviously the real numbers are an example of a real closed field. 

3 Nash Equilibria Are Roots of a Polynomial 

A Nash equilibrium is a solution of the following system of polynomial inequal- 
ities and equalities: 



Xij >0 1=1,. 


= 1 


Ui 


^ ^ ^ij — 1 ^ ■' 


m 


i=i 


P\x~\sD<P\x)i=l,.. 


= 1 



(2) 



Let n = The system has n variables and 2n -I- m = 0(n) multilin- 

ear constraints. By adding slack variables we can convert every constraint to 
an equation, where each polynomial is of degree at most m (the degree of a 
polynomial is the maximum total degree of its monomials) . Note that the slack 
variables are squared so that we do not have to add any more constraints for 
their nonnegativity: 
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= Xij - Pfj = 0 * = 1 , -jmJ = 1 , 

7li 

r^ = '^Xij -1 = 0 i = (3) 

i=i 

= P\x) - P\x~\sl) - Sfj = 0i = 

We can now combine all the polynomial equations into one by taking the 
sum of squares (Pi = 0 and P2 = 0 is equivalent to Pf + P| = 0). Therefore we 
have the following polynomial which we will refer to as the polynomial of the 
game A™): 



m rii m m ni 

An = 1:1: + 1: + Y.Y.A (-I) 

j — 1 i—1 i—1 j — 1 

Claim. A"^) has degree 2m, 0{n) variables, monomials and max- 

imum absolute value of its coefficients 0{nH^). 

3.1 Finite Representation of Nash Equilibria 

Irrationality is not necessarily an obstacle towards obtaining a finite represen- 
tation for a Nash equilibrium. For example, a real algebraic number a can be 
uniquely specified by the irreducible polynomial with integer coefficients, P, for 
which P(a) = 0 and an interval which isolates the root a from the other roots of 
P. In the next Theorem we show that every game has a Nash equilibrium that 
can be finitely represented. The proof is based on a deep result from the theory 
of real closed fields known as the transfer principle [3] . We also need to use the 
fact that equilibria always exist. We are not aware if there is an alternative way 
of proving Theorem 1. The original topological proof of existence by Nash via 
Brouwer’s fixed point theorem, though powerful enough to guarantee an equilib- 
rium, does not seem to give any further information on the algebraic properties 
of the equilibria. 

Theorem 1. For every finite game there exists a Nash equilibrium x = {x\, 
Xm) such that every entry in the probability distributions x\, - ■ ■ , Xm is an alge- 
braic number. 

Proof. Given a game (.4^, ..., T™), the set of its Nash equilibria is the set of roots 
of the corresponding polynomial <P (excluding the slack variables). By Nash’s 
proof [19] we know that the equation ^(A^, ..., T™) = 0 has a solution over the 
reals. Consider the field of the real algebraic numbers Raig. It is known that 
Raig is a real closed field [3] . The Tarski-Seidenberg theorem, also known as the 
transfer principle (see [3]), states that for two real closed fields Pi, P 2 such that 
P 2 C Pi, a polynomial with coefficients in P 2 has a root in P 2 if and only if it 
has a root in Pi. The real numbers form a real closed field which contains Raig. 
Since the coefficients of <P are integers, it follows immediately that there exists 
a Nash equilibrium in Raig. 
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A natural question is whether there are reasonable upper bounds for the 
degree and the coefficient size of the polynomials that represent the entries of 
an equilibrium. The known upper bounds are exponential. In particular, it fol- 
lows by [3] [Chapter 13] and by the Claim in Section 3 that the degrees of the 
polynomials will be and the coefficient size will be 0{L + logn)m^^'^\ 

3.2 Algorithmic Implications 

A more practical goal to pursue is to compute an approximate equilibrium. For 
this we will use as a subroutine a decision algorithm for the existential theory of 
reals. 

A special case of the decision problem for the existential theory of reals is to 
decide whether the equation P{x \, ..., Xk) = 0 has a solution over the reals. Here 
P is a polynomial in k variables of degree d and with integer coefficients. The 
best upper bound for the complexity of the problem is as provided by the 

algorithms of Basu et al. [2] and Renegar [21]. 

Theorem 2. For an m-person game, m > 2, and for 0 < e < 1, there is an 
algorithm which runs in time poly{logl/e, L,m'^) and computes an m-tuple of 
strategies a; G 5i x • • • x Sm such that: 

1. X is e/d-close to some Nash equilibrium y, where d = 

2. \P^{x) — P*(y)l < for all i = I, ■■■, m. 

3. X is an e-Nash equilibrium. 

To prove Theorem 2, we need the following Lemma: 

Lemma 1. Let y = {y\, ...,ym) be a Nash equilibrium. Let x = (xi, ..., Xm) be 
A-close to y, where Z\ < 1. Then: 

1. X is an e-Nash equilibrium for e = 2'"+^n™iJZ\. 

2. \P^{x) — P*(y)l < for all i = 1, ..., m. 

Proof We give a sketch of the proof. Since x is Z\-close to y, each Xi can be 
written in the form Xi = yi -\- Cj, where = (e^i, ..., and \cij\ < A. For 
condition 1, we need to prove that for every player i, P^{x) > P’'(x~’‘, s^) — e, 
for every pure strategy sj. Fix a pure strategy sj. Then: 

-P (^) = y ] ■ ■ ■ y ] ^ (jlj ■■■T jm){yi,ji + Cl,ii) ■ ■ ■ {ymjrn A Om,jm) 

jl jm 

= P^{y) + Al + • • • + p2m_i 

where each term Ei is an m-fold sum. Since y is a Nash equilibrium we have: 

P^x) > P\y-\ sD +Y.E, = P\x-\ si) + ^ P, + ^ P, 

where each E^ is an (m — l)-fold sum similar to the E^ terms. By performing 
some simple calculations we can actually show that: \ ^ Ei -\- ^ Fi\ < e. Hence 
^ Pj -k X) Ai > —e. Due to lack of space we omit the details for the final version. 
The second claim can also be verified along the same lines. 
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From now on, let A be an algorithm that decides whether P{x\^ = 0 

has a solution over the reals in time dP^^\ for a degree d polynomial P (either 
the algorithm of [2] or [21] will do). 

Proof of Theorem 2: By Lemma 1, we only need to find an m-tuple x 

such that X is e/d-close to some Nash equilibrium y. Let A'^) be the 

corresponding polynomial of the game. By the Claim in Section 3, the time to 
compute the coefficients of all the monomials of given the payoff matrices, 
is which is poly{mA). We can now use A combined with binary search 

to compute a rational approximation of some root. Suppose we start with the 
variable xn. We can add two more constraints to expressing the fact that 
xii € [0, 1/2]. We then run A for the new polynomial and if the answer is yes 
we know that there exists an equilibrium with xu € [0, 1/2]. We can replace the 
constraints that we added with the ones corresponding to Xn G [0, 1/4]. If the 
answer is no then there exists an equilibrium with xu € [1/4, 1/2], hence we 
can continue our binary search in that interval. Proceeding in this manner we 
will find an interval In with length at most e/(nid). For this we need to run 
0(lognid/e) = 0(logl/e + m + mlogn + L) = poly (log 1/e, L,m,n) times the 
algorithm A. We will then add to the constraints corresponding to xn G In 
and we will go on to the next variable. When we are done with the variable 
the interval h^m for xi^m is also determined. This is because xi^m should 
be equal to 1 — ^ probability distribution. Therefore the 

length of Ii^m will be at most e/d. Hence by the end of this step we know that we 
can select a probability distribution Xi for the first player such that |a;i — yi|oo < 
e/d for some Nash equilibrium y. We continue the procedure to determine an 
interval for every variable Xij. We can then output a rational number in lij 
for each variable so as to ensure that xi,...,Xm are probability distributions. 
Note that by the end we have only added 0(n) additional slack variables and 
constraints. Therefore the total running time will be poly(log l/e,L,mA). 

An exact algorithm for 2-person games. We can show that for 2-person 
games we can compute an exact Nash equilibrium using algorithm A as a subrou- 
tine. The crucial observation is that for 2-person games, if we know the support 
of the Nash equilibrium strategies, the exact strategies can be computed by solv- 
ing a linear program. This is true because an equilibrium strategy for player 2 
equalizes the payoff that player 1 receives for every pure strategy in her support 
and vice versa. Hence we can write a linear program and compute the Nash 
equilibrium with the given support since all the constraints are now linear. By 
adding constraints of the form Xij = 0 and by running A a linear number of 
times, we can identify the support of some Nash equilibrium. 

Theorem 3. There exists an algorithm that runs in time 2*^^”) and computes 
an exact Nash equilibrium. 

Due to lack of space we omit the proof. This is yet another exponential 
algorithm for computing an equilibrium in 2-person games. An upper bound 
on the compexity of the problem can be obtained by the naive algorithm that 
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tries all possible pairs of supports for the two players, which is 0{T^LP^) = 
where LP” is the time to solve a linear program with 0{n) variables and 
0(n) constraints. Our algorithm achieves the same asymptotic bound but is in 
fact worse since the constant in the exponent is bigger than two. However we 
would still like to bring it to the attention of the community firstly because it 
is a different approach that has not been addressed before to the best of our 
knowledge and secondly because a future improvement in decision algorithms 
for low degree polynomial equations would directly imply an improvement in 
our algorithm too. 

4 Approximation of Economic Equilibria 

Similar algorithms can be obtained for computing market equilibria in exchange 
economies as well as in other economic models. We will briefly mention a special 
case of the neoclassical exchange model. More information on the general model 
can be found in [23]. 

Consider a market of m agents and n commodities (or goods). Each agent 
has an initial endowment Cj G i?" . A continuous, strictly concave and increasing 
utility function m : i?" — >■ R+ is associated with each agent. 

Given a price vector p G R^, there exists a unique allocation of goods x to 
each agent i that maximizes her happiness subject to her spending constraints 
{px < pCi). Given a price vector p and an agent i we denote by Si{p) the desired 
allocation: 

S'j(p) = {S^i{p ), ..., Sin{p)) = arg max Ui(x) s. t. px < pe^ (5) 

For commodity j, let Dj{p) be the total demand for j, i.e., Dj{p) = Sij{p). 

Finally D{p) = (Di{p), Dn{p)) will be called the demand function. We will 
make the assumption that for each commodity j the demand Dj{p\, ...TPn) is a 
polynomial of degree d. 

A price vector at which the market clears (goods can be exchanged such that 
all the agents maximize their total happiness) is called a market equilibrium. It is 
easy to see that such a vector will satisfy the conditions: p > 0, D{p) — X) < 0, 
p{D{p) - Xei) = 0. 

Without loss of generality we can assume that the price vector lies on a 
simplex, i.e., Xp* = 1- That such an equilibrium always exists follows from the 
celebrated Arrow-Debreu theorem [1], which in turn is based on Kakutani’s fixed 
point theorem. By using the same argument as in Theorem 1 we can show that 
there is always an equilibrium in which all the prices are algebraic numbers. 
Goncerning the complexity of the problem, since all the equations above involve 
polynomials of degree at most d + 1 we have that: 

Theorem 4. For any e > 0, there is an algorithm that runs in time poly {log 1/e, 
d"), and computes a vector p such that p is e-close to a market equilibrium. 

This improves the bound that can be obtained by Scarf’s algorithm, which is 
exponential in both log 1 / e and n. More efficient algorithms for market equilibria 
have been recently obtained (see among others [6,7]) but only for linear utilities. 
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5 An Application: Systems of Polynomial Inequalities 

Much of the research on equilibria in economic models has focused on the al- 
gorithmic problem of computing an equilibrium. A common approach has been 
to reduce the question to an already known and studied problem (e.g. fixed 
point approximations, linear and nonlinear complementarity problems, systems 
of polynomial equations and many others). In this section we would like to pro- 
pose an alternative viewpoint and take advantage of the fact that Nash or market 
equilibria always exist. In particular, if a problem can be reduced to the existence 
of an equilibrium in a game or market, then we are guaranteed that a solution 
exists. As an example, we give the following theorem: 

Theorem 5. Let A he a n x n matrix and a* be the i-th row of A. Let S C 
{1, ..., n}. Then the following system of inequalities in n variables x = {xi , ..., x„) 

x'^ Ax — aiX >0, i € S 

has a nonzero solution. Ln fact it has a probability distribution as a solution. 

Proof. Consider the symmetric game (A, A^). It is known that every symmetric 
game has an equilibrium in which both players play the same strategy. The 
inequalities of the system correspond to the constraints that if both players play 
strategy x, a deviation to a pure strategy i, for f G S' does not make a player 
better off. 

Deciding whether a set of polynomial equations and inequalities has a solution 
(or a non-trivial solution) has been an active research topic. Similar theorems 
can be obtained for any system that corresponds to partial constraints for the 
existence of Nash equilibria or market equilibria. We do not know if an algebraic 
proof of Theorem 5 is already known. We believe that the existence of equilibria 
in games and markets can yield a way of providing simple proofs for the existence 
of solutions in certain systems of polynomial inequalities. 
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Abstract. Given an undirected graph G = (V,E) and a source ver- 
tex s (z V , the fc-traveling repairman (KTR) problem, also known as 
the minimum latency problem, asks for k tours, each starting at s and 
covering all the vertices (customers) such that the sum of the latencies 
experienced by the customers is minimum. Latency of a customer p is 
defined to be the distance (time) traveled before visiting p for the first 
time. Previous literature on the KTR problem has considered the version 
of the problem in which the repairtime of a customer is assumed to be 
zero for latency calculations. We consider a generalization of the problem 
in which each customer has an associated repairtime. In this paper, we 
present constant factor approximation algorithms for this problem and 
its variants. 



1 Introduction 

Given a finite metric on a set of vertices V and a source vertex s G V, the 
/c-traveling repairman (KTR) problem, a generalization of the metric traveling 
repairman problem (also known as the minimum latency problem, the delivery 
man problem, and the school bus-driver problem), asks for k tours, each starting 
at s (depot) and covering all the vertices (customers) such that the sum of the 
latencies experienced by the customers is minimum. Latency of a customer p is 
defined to be the distance traveled before visiting p for the first time. The KTR 
problem is NP-hard [10], even for fc = 1. The problem remains NP-hard even for 
weighted trees [11]. 

The KTR problem with k = 1 is known as the minimum latency problem 
(MLP) in the literature. The first constant factor approximation for MLP was 
given by Blum et al. [3]. Goemans and Kleinberg [8] improved the ratio for MLP 
to 3.59a. In the following discussion, let a be the best achievable approximation 
ratio for the i-MST problem. The current best approximation ratio for the i- 
MST problem is (2 -|- e), due to Arora and Karakostas [2], an improvement over 
the previous best ratio of 3, due to Garg [7]. Archer, Levin and Williamson [1] 
presented faster algorithms for MLP with a slightly better approximation ratio 
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of 7.18. Recently, Chaudhuri et al. [4] have reduced the ratio by a factor of 2, 
to 3.59. They build on Archer, Levin and Williamson’s techniques with the key 
improvement being that they bound the cost of their i-trees by the cost of a 
minimum cost path visiting i nodes, rather than twice the cost of a minimum 
cost tree spanning i nodes. 

For the KTR problem, Fakcharoenphol, Harrelson and Rao [6], presented a 
8.497a-approximation algorithm. Their ratio was recently improved to 2(2 + a) 
by Chekuri and Kumar [5]. For a multidepot variant of the KTR problem, in 
which k repairmen start from k different starting locations, Chekuri and Ku- 
mar [5] presented a 6a-approximation algorithm. Recently, Chaudhuri et al. [4] 
have reduced the ratio to 6 for both the KTR problem and its multidepot variant. 



1.1 Problem Statement 

The Generalized KTR Problem. Literature on the KTR problem shows 
that all the results thus far are based on the assumption that the repairtime of 
a customer is zero for latency calculations. In this paper, we consider a general- 
ization of the KTR problem (GKTR), the problem definition of which may be 
formalized as follows. 

GKTR: Given a metric defined on a set of vertices, V , a source vertex s & V 
and a positive number k. Also given is a non-negative number for each vertex 
V G {V — s}, denoting the repairtime at v. The objective is to find k tours, each 
starting at s, covering all the vertices such that the sum of the latencies of all 
the vertices is minimum. 

It is easy to see that the GKTR problem resembles most real-life situations, 
one of which is that the repairmen have to spend some time at each customer’s 
location, say, for the repair or installation of equipment. This applies even for 
a deliveryman who spends some time delivering goods. Hence, it is natural to 
formulate the repairman problem with repairtimes. 





Fig. 1. (a) Original graph G. (b) Transformed graph G* . (c) Optimal tour for G*. 
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At first, even though it looks like that the GKTR problem can be reduced to 
the KTR problem in a straight forward manner, taking a deeper look into the 
problem reveals that such a reduction might not be possible without a compro- 
mise in the approximation ratio. One trivial idea would be to incorporate the 
repairtimes associated with vertices into edge weights (where the weight of an 
edge represents the time to traverse that edge), which can be done by boosting 
the edge weights as follows: for every edge e incident on vertices i and j in the 
given graph G, increase the weight (or distance) of e by the sum of ri/2 and rj/2, 
where and Vj are the repairtimes of i and j respectively. Fig. 1 depicts such a 
transformation for a sample instance with k = \. The resultant graph G* after 
such a transformation will still obey triangle inequality, which allows us to use 
any of the KTR algorithms, say, with approximation guarantee /3. The solution 
obtained would be a /^-approximation for the modified graph G* . However, the 
obtained solution will not be a /^-approximation for the original problem G. This 
is due to the reason that the lower bounds for the problems defined as G and 
G* are different, as can be seen from the fact that the latency of a customer v in 
an optimal solution to G* comprises half of u’s repairtime, while this is not the 
case with an optimal solution to G. At first, even though it looks like an optimal 
solution to G will be off by just a small constant when compared to an optimal 
solution to G*, in reality, it could be arbitrarily large. 

In this paper, we present a 3/3-approximation algorithm^ for the GKTR prob- 
lem, where /3 is the best achievable approximation ratio (currently 6) for the KTR 
problem. When the repairtimes of all the customers are the same, we present an 
approximation algorithm with better ratio^. Our ratios hold for the respective 
multidepot variants of the GKTR problem as well. 



The Bounded-Latency Problem. This problem is a complementary version 
of the KTR problem, in which we are given a latency bound L and are asked 
to find the minimum number of repairmen required to service all the customers 
such that the latency of no customer is more than L. More formally, we can 
define the bounded-latency problem (BLP) as follows: 

BLP: Given a metric defined on a set of vertices, V, a source vertex s € V 
and a positive number L. The objective is to find k tours, each starting at s, 
covering all the vertices such that the latency of no customer is more than L and 
k is minimum. 

The bounded-latency problem is very common in real-life as most service 
providers work only during the day, generally a 8-hour work day. Under these 
circumstances, the service provider naturally wants to provide service to all 
its outstanding customers within the work day, by using as small a number 
of repairmen as possible. For the BLP, we present a bicriteria approximation 
algorithm that finds a solution with at most 2/p times the number of repairmen 
required by an optimal solution, with the latency of no customer exceeding 
(1 -I- p)L, p > 0. 



1 

2 



Recently, Gubbala and Pursnani [9] have improved our analysis to obtain a ratio of 

- 0 + - 
2 ^ ^ 2 • 

The ratio is 7.25, based on the current best 0 value, which is 6. 
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2 The GKTR Problem 

Our algorithms for the GKTR problem uses the KTR algorithm as a black- 
box. The current best approximation ratio for the KTR problem is 6, due to 
Chaudhuri et al. [4]. Their ratio holds for the multidepot case as well. 

Throughout this paper, the terms vertex and customer will be used inter- 
changeably. Let s denote the depot or the starting vertex. Let G be the complete 
graph induced by the vertex set V. Let r* denote the repairtime of the customer 
i (repairtime of s is zero). Let l(v) denote v’s latency. Let |a6| denote the weight 
of the edge connecting vertices a and b, which is the metric distance between a 
and b. 

2.1 Non-uniform Repairtimes 

Let G = (V,E) be the given graph for which a solution is sought. Let M CV 
be the set of vertices with k largest repairtimes. Let G' be the graph induced by 
V\M . Construct a new graph G* from G' such that for every edge e' incident 
on vertices i and j in G' , introduce an edge e* connecting i and j in G* with 
weight \ij\ + Y + ^ ■ Make the repairtimes of all the vertices in G* to be zero. 
It can be easily seen that the edges in G* obey triangle inequality, and that G* 
is a KTR instance. Let opt, opt' and opt* denote the total latencies of all the 
customers in an optimum solution for G, G' and G*, respectively. Let apx and 
apx' denote the total latencies of all the customers in our solution for G and G', 
respectively. Let APX' and APX* denote the respective approximate solutions 
for G' and G* . Before we proceed to the algorithm and its analysis, we present 
the following lemmas. 

Lemma 1. opt > opt' . 

Lemma 2. Let V = {x\, . . . ,x„} be the set of vertices in G. Let Vi denote the 
repairtime of Xi and let Rk denote the sum of the k largest repairtimes among 
the repairtimes of all vertices. Let opt be the sum of the latencies of all vertices 
in an optimal solution using k repairmen. Then, 

n 

opt > \^(^\sXi\ + r^"j - Rk- 

Proof. The fact that the latency of every vertex in an optimal solution is at 
least Isxil and that such a solution has to include at least all, but the k largest, 
repairtimes proves the lemma. 

Lemma 3. opt' = opt* - J2^ev\M f- 
Proof. We prove the lemma by showing that 

opt' > opt* — ^ y and opt' < opt* — ^ 

i£V\M i&V\M 




Minimum Latency Tours and the fe-Traveling Repairmen Problem 427 

- opt' > opt* - J2iev\M ' 2 - Suppose opt' < opt* - J2iev\M Then, we can 

construct the same set of k tours in G* as in G' such that the tour in 
G* visits the same set of vertices as visited by the tour in G' , and in the 
same order. The sum of the latencies of all the customers in such a solution 
for G* will be opt' + which contradicts the fact that opt* is the 

optimal sum of latencies for G* . 

- opt' < opt* - J2 i(^v\m ' 2 - Suppose opt' > opt* - J2i(^v\M Then, we can 
construct the same set of k tours in G' as in G* such that the tour in 
G' visits the same set of vertices as visited by the tour in G * , and in the 
same order. The sum of the latencies of all the customers in such a solution 
for G' will be opt* — X)iey\M which contradicts the fact that opt' is the 
optimal sum of latencies for G'. 

We first obtain a /^-approximate solution APX* to G* . Let ti,t 2 ,---tk be 
the set of k tours in APX*. Construct the same set of k tours in G' as in G* 
such that the tour in G' visits the same set of vertices as visited by the 
tour in G* , and in the same order. It can be seen that the sum of the latencies 
of all the customers in G' is 

apx' = (3opt* - X! 

jev\M 

Let M = {v\,V 2 , • ■ • ffc} be the set of vertices, with k largest repairtimes in 
G. Extending the tour ti in G' to include Vi as its last vertex, for all i, gives a 
feasible set of k tours for G, with apx denoting the sum of the latencies of all 
the customers in G. The latency of vertex Vi, l{vi), added to the tour will 
be at most the sum of the latency of its predecessor vertex pi (vertex visited 
by the tour just before visiting Vi), pi’s repairtime and \piVi\. Let pi be 
the vertex visited in the tour and let {ui , . . . , Uj-i} be the other vertices 
visited by the tour before visiting pi. Since \spi\ -I- |sui| > \piVi\, where s is 
the central depot, the latency of Vi can be written as follows. 

l{Vi) < l{pi) + Tp, + \spi\ + Iszzil 
j-i 

- (y~!^(Mg)) +KPt) +fpi + \spi\ + Isl’il- 

3=1 

The sum of the latencies of all Vi’s, where i = 1 . . . k, is given by 

k k j — 1 

+ KPi) + ’’’Pi + \sPi\ + Isi’il] 

i —1 i —1 9 —^ 

k j — 1 k 

= E [(E + E 

i —1 g —1 i —1 

= apx + opt (by Lemma 2) 

= l3opt* — ^ ^ -I- opt (by substituting (1)) 

i£V\M 



(2) 
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Substituting Lemma 3 in equations (1) and (2), we get 
apx = j3(opt' + I] I) - I] 

i&V\M i^V\M 

'^Kvi) < p(^opt' + ^+^P^- 

i=l i£V\M i£V\M 

Recall that {ui, V 2 , . . -Vk} is the set of vertices with k largest repartimes in 
G. The sum of the latencies of all the customers in G is given by, 

k 

apx = apx' + l{vi) 

<2f3(^opt'+ X! “ X! ^^ + opt 

i£V\M i£V\M 

= 2f3opt' + (/3 - 1) ^ Ti + opt 

i£V\M 

< 2f3opt + (/3 — 1) ^ Ti + opt (by Lemma 1) 

i&V\M 

< 2f3opt + {P — l)opt + opt (by Lemma 2) 

= ip opt. 

2.2 Uniform Repairtimes 

At first, even though it looks like we can convert the original problem into one 
in which the repairtime is added to the length of all edges except those with 
s as an endpoint, and use the KTR algorithm as it is to get a ratio of P, it is 
not possible since doing such a transformation violates the triangle inequality 
property, which is a requirement for using the KTR algorithm. We do not know 
how to obtain a ratio same as that of the KTR problem. 

When the repairtimes of all customers are the same, we show that the ap- 
proximation guarantee can be improved considerably. To achieve a smaller ap- 
proximation ratio, we present two algorithms that work at tandem. As before, 
let P be the best achievable approximation ratio for the KTR problem 

Let G = (V, E) be the given graph for which a solution is sought. Let r 
denote the repairtime of the customer i.e., ^i^sTi = r (repairtime of s is zero). 
Construct a new graph G* from G such that for every edge e incident on vertices 
i and j in G, introduce an edge e* connecting on i and j in G* with weight 
l*j| + ^ ^ = Nil + Make the repairtimes of all the vertices in G* to be zero. 

It can be easily seen that the edges in G* obey triangle inequality and that G* is 
a KTR instance. Let opt and opt* denote the total latencies of all the customers 
in an optimum solution for G and G*, respectively. Let apx denote the total 
latency of all the customers in our solution for G. Let n denote the number of 
vertices in the graph. 
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Proposition 1. Let C he a positive constant. Let x and y be variables such that 
X + y = C . Then x{x — 1) + y{y — 1) is minimum when x = y. 



Lemma 4. Ln any optimal solution, the contribution to the sum of latencies due 
to repairtimes alone is at least 



Proof. Now suppose that for a given n and k, there exists an optimal solution for 
which the contribution due to repairtimes alone is optr. If, in such an optimal 
solution, there exists two repairmen who visit different number of customers, 
say y and z, then, we can always construct an alternate solution ALT from the 
optimal solution by making those two repairmen visit customers each. By 
Proposition 1, the contribution to the sum of latencies, in ALT, due to repair- 
times alone will be less than optr. We can continue to find a feasible solution in 
this manner, until all repairmen visit the same number of customers, which is 
That brings us to the following equation, which proves the lemma. 



optr > rk 



n(n 
k^k ' 

2 



1 ) 



rn(f -1) 



2 



Algorithm 1. This algorithm proceeds on a case- by-case basis, based on the 
value of k with respect to n. Let the customers be sorted in non-decreasing order 
with respect to their distances to the depot. Let A = {ci, . . . , c„} be the sorted 
set of n customers, i.e., |sci| < |sc 2 | < . • . < |sc„|. 




Fig. 2. (a) f < fc < f (b) f < fc < f (c) 5 < fe < f (d) 0.22598n <k<^ 
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1. Case k > ^. The repairman visits customer Ci first, Vi<fc. In addition, 

repairmen 1 to n — k are assigned to visit one customer each, from the 
remaining pool oi n — k unassigned customers, as their second customer. 
Let t\ be one of k such tours constructed in this manner. Let ci, and maybe 
c/, be the customers visited by tour in that order. The latency of ci would 
just be |sci| and the latency of c/ would be at most |sci| + r + |sci| + |sc/|. 
Fig. 2(a) depicts this pictorially. The sum of the latencies of customers ci 
and c/ is |sci| + |sc/|+r + 2|sci|. The sum of the latencies of all the customers 
visited by k tours is then at most + (n — k)r + 2 IsCjj. By 

Lemma 2, the sum of the latencies is at most opt + 2 kcjj- Since A is 

sorted in non-decreasing order, the approximation ratio is less than or equal 
to 1 -I- 2(^^) = 3 — 2(^). As k > the ratio is at most 2. 

2. Case f < fc < f . The repairmen visits customer a first, Vi<fc. Each 
repairmen picks one customer out of {cfe+i, . . . , C 2 k} to be his next customer. 
In addition, repairmen I to (n — 2k) are assigned to visit one customer each, 
from the remaining pool oi n — 2k unassigned customers, as their third 
customer. 

Let ti be one of k such tours constructed in this manner. Let ci,c/, and 
maybe cu, be the customers visited by tour ti, in that order. The latency 
of Cl would just be |sci|. The latencies of c/ and cn would be at most 
|sci| -I- r -I- |sci| -I- |sc/| and |sci| + r + |sci| -I- |sc/| -I- r -I- |sc/| -I- |sc//|, 
respectively (see Fig. 2(b)). The sum of the latencies of customers ci, c/ and 
c /7 is |sci| -I- I SC/ 1 -I- I SC// 1 -|-4|sci| -I- 2|sc/| -I- r -|- 2r. The sum of the latencies 
of all the customers visited by k tours is given by 



n—k 



apx 



= II|sc,|+4^|sc,| + 2 ^ I scj I + kr + {n — 2k)2r 



i=i 



i=i 



j=k+l 



- ^ ^ ^ ^ ^ ^ ^ + kr + {n — 2k)2r 



i=i 



j=i 



i=i 



^ A 1 0 /' 07 '\ ^ 

= ^ \sCj\ + {n-k)r-\ ^ \scj\ + {n - k)r-\ ^ ^ \scj\-kr 



i=i 



j=i 






4fc - 1 ^ , , 2(n - 2fc) , 

< opt H > sc,- -I- opt H , — > sc,- (by Lemma 2) 

n ^ n—k ^ 






i=i 



„ 4fc-l 2{n-2k) 

< 2opt H opt H — iopt 



■ 4fc 2(n-2k) 
1-k — -k ^ 

n n—k 



1 -I- 4x -I- 



2(1 - 2x)- 

1 — X - 



n—k 

opt 

opt 



where x = K Since | < A: < §, it turns out to be that apx < ^opt. 




Minimum Latency Tours and the fc-Traveling Repairmen Problem 431 



3. Case f < fc < |. The algorithm (see Fig. 2(c)) and its analysis proceeds 
in the same manner as in case 2 and the sum of the latencies of all the 
customers visited by k tours is given by 



apx < 



1 + 6a: + 



4x 

1 — X 



2(1 - 3a;)i 
l-2a; - 



opt 



where x = K Since f < A: < ^, it turns out to be that apx < 5.027opt. 

4. Case ^ < k < j. The algorithm (see Fig. 2(d)) and its analysis proceeds 
in the same manner as in case 2 and the sum of the latencies of all the 
customers visited by k tours is given by 



apx < 



1 -t“ Sx -\- 



6x 

I — X 



4x 

1 - 2a; 



2(1 -4x)- 
1 — 3a; - 



opt 



where x = K Since ^ < k < j, it turns out to be that apx < 7 opt. 



Algorithm 2. Just like in non-uniform repairtime case, we find a /^-approximate 
solution APX* to G* . Let ti,t 2 , ■ ■ - tk be the set of k tours in it. Construct the 
same set of k tours in G as in G* such that the tour in G visits the same set 
of vertices as visited by the tour in G*, and in the same order. It can be seen 
that the sum of the latencies of all the customers in G is 

n 

apx = f3opt* = l3opt* ~Y- ( 3 ) 

Z=1 



By Lemma 3, 



opt = opt* - X! ^ 






Substituting for opt* in equation (3), we get 



nr 

T' 



apx = 



^fopt+(^) 



nr. 



Using Lemma 4, the approximation ratio of Algorithm 2 can be calculated 
from the above equation as follows. 



_ f3opt+{l^)nr (l^)nr 

opt opt ~ rn(^-l) — V 5 _ 1 / 



( 4 ) 



Substituting /J = 6, it can be easily verified that the ratio is at most 7.25 for 
values of fc < Since the ratio is bounded by 7 for values of fc < (Algorithm 1) 
and 7.25 for values oi k < ^ (Algorithm 2), we get the following theorem. 
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3 The Bounded-Latency Problem 

The bounded-latency problem (BLP) is a complementary version of the KTR 
problem, in which we are given a latency bound L and are asked to find the 
minimum number of repairmen required to service all the customers such that 
the latency of no customer is more than L. Unlike the GKTR problem where 
the sum of the latencies are minimized, the objective function of BLP is to 
minimize the number of repairmen with the constraint that the latency of no 
customer exceeds L. For this problem, we present a bicriteria algorithm that 
finds a solution with at most 2 j p times the number of repairmen required by an 
optimal solution, with the latency of no customer exceeding (1 -|- p)L, p > 0. 

Proposition 2. Length of any optimal set of tours for the BLP is at least the 
length of an MST. 

Given below is an algorithm which groups the customers, so that a repairman 
can be assigned to each of the groups. Let p > 0. 

1. Gonstruct a tour for the given set of vertices (depot and customers) using 
the best available approximation algorithm for TSP. 

2. Remove the depot from the tour. 

3. Set lengthTraveled = 0. 

4. Starting from some vertex, traverse the tour. 

5. While not all edges in the tour are traversed, traverse the next edge e on the 
tour. 

a) If lengthTraveled -T length(e) < pL, set lengthTraveled -T= 
length(e). 

b) Else remove e from the tour, and set lengthTraveled = 0. 

At the end of the above algorithm, we will be left with segments, each of 
length at most pL. For each segment, introduce two edges to connect its end- 
points (vertices) to the depot. Since our tour is of length at most twice than that 
of an MST, by Proposition 2, our solution will require at most 2/p repairmen to 
traverse the 2/p tours. Assuming that there exists a feasible solution for a given 
instance, the length of an edge connecting any vertex to the depot is at most L. 
Hence, regardless of the which direction each tour in our solution is traversed, 
each customer will have a latency of at most (1 -T p)L. 
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Abstract. We explain how the apparent goals of the Unix CPU schedul- 
ing policy can be formalized using the weighted norm of flows. We 
then show that the online algorithm. Highest Density First (HDF), and 
the nonclairvoyant algorithm. Weighted Shortest Elapsed Time First 
(WSETF), are almost fully scalable. That is, they are (1-1- e)-speed 
0(l)-competitive. Even for unit weights, it was known that there is no 
0(l)-competitive algorithm. We also give a generic way to transform an 
algorithm A in an algorithm B in such a way that if A is 0(l)-speed 
0(l)-competitive with respect to some ip norm of flow then B is 0(1)- 
competitive with respect to the ip norm of completion times. Further, if 
A is online (nonclairvoyant) then B is online (nonclairvoyant). Combin- 
ing these results gives an 0(l)-competitive nonclairvoyant algorithm for 
ip norms of completion times. 

1 Introduction 

1.1 Motivation 

Tanenbaum [15, page 704] describes the generic Unix CPU scheduling policy as 
follows. Each process initially has a nice value in the range -20 to 20. Lower nice 
values correspond to processes that are more important. Users can set the nice 
value of a process to be in the range from 0 to 20 with a nice system call. Only 
the system administrator can give a process a negative nice value. Once a second 
the priority of a process is recalculated using the formula: 

priority = CPUusage + nice + base 

Here the CPUusage parameter is an exponential weighted moving average of past 
CPU usage, the nice parameter is the nice value for the process, and the base 
parameter is used to give higher priority to jobs that have just returned from 
some sort of interruption (say for I/O). Confusingly enough, the high priority 
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jobs are those whose computed priority value is smallest. The jobs with highest 
priority are then scheduled using a Round Robin(RR) policy, typically with the 
quantum on order of 100 milliseconds. Round Robin shares the processor equally 
among all processes. 

Round Robin represents an apparent effort to balance between optimizing 
the worst case Quality of Service (QoS) and optimizing the average case QoS. 
If the goal was to optimize worst case QoS then the best algorithm would be 
First Come First Served (FCFS). If the goal was to optimize average QoS then 
Shortest Elapsed Time First (SETF) is generally considered to be the best non- 
clairvoyant algorithm. Processes with lower nice values get more of the CPU, 
but the CPUusage parameter works to try to prevent starvation. That is, the 
CPUusage parameter will be high for processes that have been run a lot recently, 
and thus these processes will have a higher computed priority, and thus these 
processes will be given less CPU time in the near future. So it seems that the 
Unix system designers’ goals for the process scheduling policy were: 

Goal A: Amongst jobs of the same priority, there should be some balance be- 
tween optimizing for average QoS and optimizing for worst case QoS. 

Goal B: Higher priority jobs should get a greater share of the CPU resources, 

but lower priority jobs should not be starved. 

In this paper we try to formalize these goals and then analyze algorithms 
with respect to this formalization. 

In the literature, the most common QoS measure for a single process /job 
Ji is clearly ffow/response/ waiting time fi = Ci — ri, where Ci is the time that 
the job completes and ri is the time that the job enters the system. The most 
common way to compromise between optimizing for the average and optimizing 
for the worst case is to optimize the £p norm, generally for something like p = 2 
or p = 3. For example, the standard way to fit a line to collection of points 
is to pick the line with minimum least squares, equivalently £2, distance to the 
points, and Knuth’s T[;]Xtypesetting system uses the £3 metric to determine line 
breaks [12, page 97]. The 1 < p < 00 , metric still considers the average in the 
sense that it takes into account all values, but because is strictly a convex 
function of x, the £p norm more severely penalizes outliers than the standard £i 
norm. Analyses of algorithms for optimizing the £p norms of flow, 

can be found in [3]. 

The most common way that priorities of jobs is formalized is to assume that 
each job Ji has a positive weight Wi and then to have the objective function be 
maximizing the weighted QoS. By far the most commonly studied QoS measure 
for a collection of equal priority jobs is average flow time, and logically enough, 
the most commonly studied QoS measure for jobs with variable priorities is 
weighted flow time ^ Wi ■ Fi, e.g. [2, 5, 6, 7]. It is easy to see that even an optimal 
algorithm for optimizing weighted flow time does not in general accomplish Goal 
B as it can starve low weight jobs if there are always higher weight jobs to be 
run. 

If one wishes wishes to achieve both Goal A and Goal B, then the appro- 
priate objective function to optimize would be something like the weighted £p 
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norms of flow, that is, , where p > 1 is some small constant. Note 

that in any competitive schedule for the weighted £p norm of flow, a low weight 
job Ji would eventually be scheduled even in the face of a constant stream of 
high weight jobs. 

In [3] it was shown that there is no 0(l)-competitive online scheduling al- 
gorithm for any unweighted £p norm of flow. This motivated the authors of [3], 
and us, to fall back to resource augmentation analysis [9]. In the context of a 
scheduling minimization problem with an objective function F, an algorithm A 
is s-speed c-competitive if 



max 

I 



F{Asm 

F(Opti(I)) 



< c 



where As{I) denotes the the schedule that algorithm A with a speed s produces 
on input I, and similarly Opt]^(F) denotes the adversarial schedule for X with 
a unit speed processor. A (1 -|- e)-speed 0(l)-competitive algorithm is said to 
be almost fully scalable [13]. The intuition is that such an algorithm should 
perform well up to load close to the capacity of the system since increasing speed 
corresponds to lowering the load. This intuition is borne out in the lower bound 
instances, such as those in [3], that show no algorithm can be 0(l)-competitive. 
In the lower bound instances, the system is fully loaded, so that there are no spare 
resources to recover from even small mistakes in scheduling decisions. For a more 
in depth discussion of this motivation see [9,3,13]. In [3] it is shown that several 
standard algorithms — SETF, Shortest Remaining Processing Time(SRPT), 
and Shortest Job First(SJF) — are almost fully scalable for any £p norm of flow. 
Surprisingly, RR is not almost fully scalable for any £p norm of flow. Note that 
this result would argue against the use of RR by Unix. 



1.2 Our Results 

We first show in section 3 that the results in [3] can be extended to the case 
where the objective function is the weighted £p norm of flow. In particular, we 
show that the algorithm Highest Density First (HDF) is almost fully scalable. 
HDF always runs the job that has the largest weight to work ratio. HDF is 
the natural generalization of SJF. Note however that HDF is clairvoyant, that 
is, it needs to know the work of a job at its release time. While this might be 
reasonable in a web server serving static documents, this is not reasonable in 
the context of an operating system. 

We then show in section 4 that the obvious nonclairvoyant generalization 
of the nonclairvoyant algorithm SETF, Weighted Shortest Elapsed Time First 
(WSETF), is almost fully scalable. For a job Ji, let Xi(t) denote the amount 
of work done on that job by time t. We define the measure of a job Ji as 
ll'^illi = Amongst the jobs with the smallest measure, WSETF splits the 
processor proportionally to weights of the jobs. So, if Ji, . . . , Jfc are the jobs that 
have the smallest measure, then the job Jj will receive a rCj/(X)?=i fraction 
of the processor. Thus this result suggests the adoption of the algorithm WSETF 
by Unix. 
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An interesting aspect of our analysis of HDF and WSETF is that we first 
transform the problem on the weighted instance to a related problem on the 
unweighted instance. This makes the problem simpler and also allows us to use 
previous results on unweighted scheduling. 

There is a lot of literature on scheduling to minimize total/average comple- 
tion time ( a nice survey can be found in [11]), and average weighted completion 
time [8,1]. While this does not appear to be an interesting objective function from 
a computer systems point of view, it seems to be of general academic interest. So 
one natural academic question to ask is whether there are good online algorithms 
when the objective is the £p norm of completion time, or the weighted £p norm 
of completion time. In section 5 we give a rather generic way to transform an al- 
gorithm for a flow time problem, which possibly uses resource augmentation, to 
obtain an algorithm for the corresponding completion time problem, which does 
not use resource augmentation. A nice property of our transformation is that 
online algorithms are transformed to online algorithms, and non-clairvoyant al- 
gorithms are transformed to non-clairvoyant algorithms. As a corollary of this 
result, we will obtain 0(1) competitive online and non-clairvoyant algorithms 
for minimizing the £p norms of weighted completion time. 

1.3 Other Related Results 

The following results are known about online algorithms when the objective func- 
tion is average flow time. The competitive ratio of every deterministic nonclair- 
voyant algorithm is I7(n^/^), the competitive ratio of every randomized nonclair- 
voyant algorithm against an oblivious adversary is l7(logn) [14]. The random- 
ized nonclairvoyant algorithm RMLF, proposed in [10], is 0(logn)-competitive 
against an oblivious adversary [4]. The online clairvoyant algorithm SRPT is 
optimal. The online clairvoyant algorithm SJF is almost fully scalable [5]. The 
nonclairvoyant algorithm SETF is almost fully scalable [9]. 

For online weighted flow time, the best known competitive ratio is 
OilogW) [2]. It is an outstanding open question whether an 0(l)-competitive 
algorithm exists. 

2 Definitions 

We assume a collection of jobs J = Ji, . . . , J„. For Jj, the release time is denoted 
by Vi, the work/size by pi, and weight by Wi. Without loss of generality we assume 
that all job sizes and job weights are integers. The completion time cf of a job 
Ji in a schedule S is the first time after n where Ji has been processed for pi 
time units. The flow time of Ji in S' is /j = cf — r*. A clairvoyant algorithm 
learns Pi at time . A nonclairvoyant algorithm only knows a lower bound on pi 
equal to the length of time that it has run Ji. For an algorithm A on an input 
instance X with an s speed processor, let FP{A,I, s) denote the sum of the 
powers of the flow time of all jobs. Similarly, WF’^(A,X, s) will denote the sum 
of weighted p*^ powers of the flow time (i.e. Wiff) of all jobs. Finally, for the 
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measure F^, let s) denote the value of the optimum schedule for the 

FP measure on I with a speed s processor. Similarly, let Opt{ WF^ ,T, s) denote 
the optimum value for the WF^ measure. 

3 Analysis of HDF 

In this section we show that HDF, a natural generalization of SJF is a (1 + 
e)-speed 0(l/e^)-competitive online algorithm for minimizing the weighted £p 
norms of flow time. 

The algorithm HDF at any time works on the job which has the largest weight 
to processing time ratio. The ties are broken in favor of the partially executed 
job. We will show that 

Theorem 1. HDF is (1 + e)-speed, 0{l/e^)~ competitive for minimizing the 
weighted £p norms of flow time. 

The main idea of the proof will be to reduce the weighted problem to an 
unweighted problem and then invoke the result for £p norms of unweighted flow 
time. We first define the relevant notation. 

Given an instance I, we define an instance I' obtained by applying the fol- 
lowing transformation to each job in I: Consider a job Ji G I. The instance I' is 
obtained by replacing Ji by Wi identical jobs each of size Pi/wi and weight 1, and 
release time r* . We denote these Wi jobs by J^,. . . , J'^, . . Let W = { J'^ , . . . , . } 

denote this collection of jobs obtained from Ji. Note that all jobs in I' have the 
same weight. 

Lemma 1. For I and I' as defined above, 

Opt{FP,X', 1) < Opt{ WFP, 1, 1) (1) 

Proof. Let S be the schedule which minimizes the weighted £p norm of flow time 
for I. Given S, we create a schedule for I' as follows. At any time t, work on 
a job in Xi if and only if Ji is executed at time t under S. Clearly, all jobs in 
Xi finish when Ji finishes execution, thus no job in Xi has a flow time higher 
than that of Ji. By definition, the contribution of Ji to WFP is Wiff. Also, the 
contribution to the measure FP of each of the Wi jobs in Xi will be at most /f , 
and hence the total contribution of jobs in Xi to FP is at most Wiff . Since the 
optimum schedule for I' can be no worse than the schedule constructed above, 
the result follows. 

From Theorem 3 in [3] we know that SJF is (l-l-e)-speed, 0(l/e) competitive 
for the (unweighted) £p norms of flow time , or equivalently SJF is (1 -I- e)-speed 
0{l/eP) competitive for the FP measure. This implies that, 

FP{SJF,X', 1 + e) = 0{^)0pt{FP,X', 1) (2) 

We now relate the performance of HDF on X with a (1 -I- e) times faster 
processor to that of SJF on X'. 
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Lemma 2. 



WFP{BDF,T, 1 + e) < (1 + -YF^{SJF,T\ 1) (3) 

e 

Proof. We claim that for every job Ji G I and every time t, if Ji is alive at time 
t under HDF with a 1 + e speed processor, then at least j^Wi jobs in Xi G X' 
are alive at time t under SJF with a 1 speed processor. 

The claim above immediately implies the result for the following reason. 
Consider the time t~ = {fi + ri)~ just before Ji finishes execution under FIDF. 
Then Ji contributes exactly Wiff to WF^{HDF,T, 1 + e), while the > ewi/{l + e) 
jobs in Xi that are unfinished by time t contribute at least tWi/{l + e)/f to 
FP{SJF,I' , 1). Taking the contribution over each job, the result follows. 

We now prove the claim. Suppose for the sake of contradiction that t is the 
earliest time when Ji is alive under FIDF and there are fewer than e/(l + e)wi 
jobs from Xi left under SJF. Since Ji is alive under FIDF and FIDF has a 1 + e 
faster processor, it has spent less than Pi/(1 + e) time on Ji, whereas SJF has 
spent strictly more than Pi/(1 + e) time on Xi. Thus there was a some time t', 
such that Ti < t' < t during which FIDF was running Jj ^ Ji while SJF was 
working on some job from Xi. Since t' > Ti, it follows from the property of FIDF 
that Jj has higher density than that of Ji. This implies that jobs in Xj have 
smaller size than Xi. Since SJF works on Xi at time t' , it must have already 
finished all the jobs in Xj by t' . Since Jj is alive at time t' , this contradicts our 
assumption of the minimality of t. 

Proof, (of Theorem 1) By Equations 2 and 3 we have that 

WFP{HDF,I, (1 + ef) = 0{l/efPOpt{FP ,1' , 1) 

Combining this with Equation 1 gives us the result. 



4 Analysis of WSETF 

4.1 Algorithm Description 

For a job Ji with weight Wi , let Pi{f) denote the amount of work done on Ji by 
time t. We define the norm of a job Ji as |lJi||t = 

Algorithm WSETF: At all times, WSETF splits the processor, proportional to 
weights of the jobs, among the jobs Ji that have the smallest norm || Ji||t. So, if 
Ji, . . . , Jfc are the jobs that have the smallest norm. Then Jj, for i = 1, . . . ,k, 
will receive fraction of the processor. 

Note that for all jobs Ji that WSETF executes, the norm increases at the 
same rate and thus stays the same. 
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4.2 Analysis 

As in the analysis of HDF the main step of our analysis will be to relate the 
behavior of WSETF on an instance X with weighted jobs to that of SETF on 
another instance I' which consists of unweighted jobs. We then use the results 
about (unweighted) £p norms of flow time under SETF to obtain results for 
WSETF. 

Given an instance X consisting of weighted jobs, let X' denote the instance de- 
fined as in Section 3 which consists of unweighted jobs. Suppose we run WSETF 
on X and SETF on X' with the same speed processor. Then the schedules pro- 
duced by WSETF and SETF are related by the following simple observation. 



Lemma 3. At any time t, a job X € X is alive and has received Pi{t) units 
of service if and only if each job in € X' is alive and has received exactly 
Pi{f)/wi amount of service. In particular, this implies that if Ji has flow time fl 
then each J[f. € Xi for k = 1, ... ,Wi has flow time fl. 

Proof. We view the execution of WSETF on X as follows: If at any time WSETF 
allocates x units of processing to a job of weight Wi, then we think of it as 
allocating x/wi units of processing to each of the Wi jobs in the collection Aj. 
Thus the norm of job Ji under WSETF is exactly equal to the amount of service 
received by a job in Xi. Since WSETF at any time shares the processor among 
jobs with the smallest norm in the ratio of their weights, this is identical to the 
behavior of SETF on X' which works equally on the jobs which have received 
the smallest amount of service. 



Theorem 2. WSETF is a 1 + e-speed, competitive non-clairvoyant 

algorithm for minimizing the weighted £p norms of flow time. 

Proof. By Lemma 3 we know that ii Ji G X has flow time fl, then the Wi jobs 
in Xi have flow time fl. Thus the £p norm of unweighted flow time for X' is 
(Si uJiffY^^ which is identical to the weighted flow time for X under WSETF, 
which implies that 



WFP {WSETF, X, 1) = FP{SETF,X', 1) (4) 

By Equation 1 we know that Opt{FP ,X' , 1) < Opt{ WFP ,X, 1). By the main 
result of Section 7 in [3] about the competitiveness of SETF for unweighted £p 
norms of flow time we know that 

FP{SETF,X', (1 -k e)) = 0{l/e^P+‘^)0pt{FP ,X' , 1) (5) 

Now, by Equations 4, 5 and 1 we get that 

WFP {WSETF, X, l + e) = 0{l/e^P+^)0pt{ WFP,X, 1) 



Thus the result follows. 
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5 Completion Time Scheduling 

In this section, we give a rather generic way to transform an algorithm for a flow 
time problem that possibly uses resource augmentation to obtain an algorithm 
for the corresponding completion time problem that does not use resource aug- 
mentation. Our transformation carries online algorithms to online algorithms 
and also preserves non-clairvoyance. As a corollary of this result we will ob- 
tain 0(l)-competitive online and non-clairvoyant algorithms for minimizing the 
weighted ip norms of completion time. 

We first make precise the notion of a completion time measure corresponding 
to a flow time measure. Given a schedule S for n jobs, this determines the flow 
times /i, . . . , /„ and the completion times ci, . . . , c„. Let Q be some function 
that takes as input n real numbers and outputs another real number. Given a 
schedule S', we define the functions T and C as follows: 

T{S)=g{h,h,...,fn) 

C{S) = C/(ci,C 2 ,...,c„) 

For example, if t/(xi, . . . , a;„) = then if and C are simply the 

weighted ip norms of flow time and completion time respectively. 

Our technique for converting a flow time result to a completion time result 
will require two properties from the function Q. 

Scalability: For any positive real number k, G{kxi, . . . ,kxn) = 

kQ{x\, . . . ,Xn)- In particular, if we scale all the flow times in a schedule by 
k times then 1F(S) increases by k times. 

We now motivate the next property that we require from the function Q. We 
first point out a somewhat surprising property of the ip norms of the completion 
time measure. While it is easy to see that minimizing the total weighted flow 
time (i.e. ip norm with p = 1) is equivalent to minimizing the total weighted 
completion time, this is not the case for p > 1. In particular, it could be the 
case that a schedule which is optimum for the ff measure is suboptimal for 

cf measure and vice versa. 

Gonsider the following instance with just two jobs. The first job has size 10 
and arrives at t = 0, the second job has size 1 and arrives at t = 8. A simple 
calculation shows that in order to minimize the total flow time squared, it is 
better to first finish the longer job and then the smaller job. This incurs a total 
flow time squared of 10^ -I- 3^ = 109, where as the other possibility which is 
to finish the small job as soon as it arrives an then finish the big job incurs a 
total flow time squared of 11^ -I- 1^ = 122. On the other hand, if we consider 
completion time squared, finishing the larger job first incurs a cost of 10^ -|- 11^. 
If instead if finish the smaller job first, this incurs a cost of 9^ -I- 11^. Thus the 
optimal schedule for ip norms of flow time need not be optimal for ip norms of 
completion time and vice versa. 

We say that a function G is p — good if is satisfies the following condition: 

Given a problem instance X and any two arbitrary schedules S and S' for X. 
If J-(S') < cT(S'), then C{S) < pcC(S'). 
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Lemma 4. Q{xi,..., Xn) = is 2 — good for all p> 1. 

Our main result is the following: 

Theorem 3. Let Q be a p — good function. If there is an s-speed, c- competitive 
online algorithm with respect to the measure T (derived from Q), then this 
algorithm can he transformed into another online algorithm which is 1-speed, 
pcs -competitive with respect to the corresponding completion time measure C. 
Moreover, non-clairvoyant algorithms are transformed into non- clairvoyant al- 
gorithms. 

We now describe the transformation: 

Let T be a s-speed, c-competitive algorithm for a flow time problem. Let I 
be the original instance where job Ji has release date and size Pi . The online 
algorithm (which we call B) is the defined as follows: 

1. When a job arrives at time ri, pretend that it has not arrived till time sr^. 

2. At any time t, run A on the jobs for which t > sri 

Proof, (of Theorem 3) Let I' be the instance obtained from X by replacing job 
A G I by a job J' that has release date sri and size spi. Also, let X" be the 
instance from X by replacing the job Ji £ X with a job J" that has release date 
sri and size pi. 

Let Opt{T ,X,x) (resp Opt{C,X,x)) denote the flow time cost (resp comple- 
tion time cost) of the optimum schedule on X run using an x speed processor. 
We first relate the values of the optimum schedules for X and X'. 

Fact 4 Opt{C,X', 1) = sOpt(C,X, 1) 

By our resource augmentation guarantee for the algorithm A, we know that 
T{A,X',s) < cOpt{T,X',l) 

By the p — goodness of Q the above guarantee on flow time implies that 

C{A,X',s) < cpOpt{C,X' ,1) (6) 

We now relate X' to X” . 

Fact 5 C{A,X' , s) = C{A,X” ,1) 

Now, by definition of the algorithm B, executing the algorithm A on X" with 
a speed 1 processor is exactly the schedule produced by B on I using a 1 speed 
processor. So the completion times are identical. This implies that 

C{B,X,1)=C{A,X",1) (7) 

Now using Facts 4 and 5 and Equations 6 and 7 it follows that 

C{B,X, 1) < cpsOpt{C,X, 1) 



Thus we are done. 
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For Q{x\,. . . ,Xn) = WiX^y^P, it is easily seen that the scalability prop- 
erty is satisfied, and Lemma 4 implies that it is 2 — good. Thus by Theorems 1, 

2 and 3 we get that 

Corollary 1. There exist 0(1) -competitive clairvoyant and non-clairvoyant al- 
gorithms for minimizing the weighted ip norms of completion time. 
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Abstract. Two processors receive inputs X and Y respectively. The 
communication complexity of the function f is the number of bits (as a 
function of the input size) that the processors have to exchange to com- 
pute f{X,Y) for worst case inputs X and Y. The List-Non-Disjointness 
problem {X = ,x"), Y = (y^ , . . . a:-’, G Z^, to decide 

whether x^ = y^) exhibits maximal discrepancy between determinis- 
tic V? and Las Vegas (&(n)) communication complexity. Fleischer, Jung, 
Mehlhorn (1995) have shown that if a Las Vegas algorithm expects to 
communicate l?(nlogn) bits, then this can be done with a small number 
of coin tosses. 

Even with an improved randomness efficiency, this result is extended to 
the (much more interesting) case of efficient algorithms (i.e. with linear 
communication complexity). For any R € N, R coin tosses are sufficient 
for 0(n + n^/2^) transmitted bits. 



1 Introduction 

For many computations, communication is the decisive bottleneck. For example, 
in order to multiply two integers of length n each on a VLSI chip, it is necessary 
and sufficient to have an area time product AT^ = 0(n^), for a the whole range 
of meaningful times (clogn <T< dri). Such a result is obtained by partitioning 
the chip into two parts, viewing each part as a separate computing agent and 
considering the required communication between the two agents. For a given 
time, this communication requires a certain bandwidth implying a bound on the 
width and area of the chip. 

This is just one example illustrating the fact that to analyze the complexity 
of certain algorithms, it is useful to study the communication complexity in the 
two agent model introduced by Yao [11]. Two agents want to compute a function 
/. They receive inputs X and Y respectively. The communication complexity 
C assigns to every input size n the number of bits that have to be transmitted 
between the two agent, before one of them knows the value f{X,Y) for worst 
case inputs X and Y of size n each (see Figure 1). 
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Left Input 
A 



Right Input 
V 



back and forth messages until 
one agent knows /(A, Y) 



Fig. 1. Two agent communication complexity 



The purpose of this paper is to study the effect of limited randomness on the 
communication complexity for reliable (i.e., Las Vegas) computations. Rather 
than using communication complexity as a tool to analyze the speed of algo- 
rithms, we want to design clever communication algorithms (protocols) to speed 
up the communication time, in order to study the tradeoff between communica- 
tion and randomness. 

We focus on one particular example that exhibits extremal behavior. The 
inputs A and Y are n x n matrices with entries from {0,1}. Note that in this 
case, the size n of the input is not the number of bits in its representation. 
The List-Non-Disjointness problem (LND) asks whether there is an j such that 
matrices X and Y agree in their jth columns and respectively. 



LND(A,V) 



0 if Vj 

1 if 3j x^ = y^) 



Figure 2 presents the List-Non-Disjointness problem with an example for 
which LND(A, y) is 1 (or true). 

Mehlhorn and Schmidt [8] have introduced the LND problem. They have 
determined its deterministic communication complexity 

Coet(LND) = 



and provided a good upper bound on its Las Vegas communication complexity 
^Las vegas(LND) = O(nlog^n) 

The communication complexity C measures the expected number of bits 
transferred for a worst case input. 

Aho, Ullman and Yannakakis [1] have shown that this result is almost op- 
timal. They have proved that for any function /, even the nondeterministic 
communication complexity (where both computing agents can guess, but are 
not allowed to output a wrong value for /(A, Y)) is bounded by 



CNDet(/) = niVCBetif)) 
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Left Input 



Right Input 



X = (a;^ • • • a;") = 



/O 1 1 0\ 
10 0 0 
110 1 
\OlllJ 



Y={y^---y”) = 

e.g. 



/o 0 1 i\ 
10 0 1 
0 10 1 
\l01lj 



One agent knows LND(X, Y) 



Fig. 2. The List-Non-Disjointness Problem (LND) with an example 



trivially implying 

^Las Vegas(/) = ^{\/ CDet(/)) 

Indeed, using a more efficient communication algorithm, it has been shown 
that this lower bound is tight [6], because 

^Las Vegas(LND) = 0( V^Det (LND)) 

i.e., LND exhibits the maximal possible discrepancy between deterministic and 
Las Vegas communication complexity. The communication algorithm exhibiting 
this discrepancy has not been optimized for its use of randomness. 

In a more recent paper, Fleischer, Jung and Mehlhorn [5] have shown that 
instead of jumping directly from deterministic to Las Vegas protocols, one could 
interpolate between these two extremes by considering coin tosses as a limited 
resource. The paper shows that by increasing the number of random bits from 
R to R + log(i? + 1) + 0(1)^ one can provably decrease the communication 
complexity of LND for every R with 

0 < i? < log n — log log n 

This range of randomness corresponds to a range of communication between 
0(n^) (deterministic case, i? = 0) and 6>(n log n) (using almost logn random 
bits). The interesting case of communication efficient computations, i.e., commu- 
nication 0{n), is left out, because it cannot be handled by the communication 
protocol based on prime numbers [6,5]. 

Solving two of the three open problems in Fleischer, Jung and Mehlhorn [5], 
we improve their results in two respects: 

— The range of the communication complexity is extended all the way down 
to the interesting region of 0(ri). Before, only the number of random bits 
required for inefficient algorithms has been determined. 



^ log denotes the logarithm to the base 2 




An Improved Communication-Randomness Tradeoff 447 



~ The randomness is used more efficiently such that even increasing the number 
of random bits by 3 provably decreases the communication complexity. 

Our improved communication algorithm is based on a simplified protocol [7] for 
the LND problem using ideas from universal hashing [2] . This protocol had been 
developed to exhibit a function with 

= 0{n^) 



but 



^^Las Vegas = 6>(n polylog n) 
It is conjectured that for all functions / 

"^^Las Vegas = ^ ^\/^Tbet) 



2 Definitions 

There are two precise definitions of communication complexity in use. In the 
traditional definition, the two agents alternate between sending and receiving one 
bit [11]. In some literature focusing on the number of communication rounds [10], 
obviously longer messages are allowed. Both definitions are fine for most of our 
purposes, as we do not focus on communication rounds. But some of our results 
are so precise that constant factors matter. For these results, we don’t want to 
use the first definition, as it might waste up to a factor of 2, and we don’t want 
to use the second definition, as it implicitly includes an end-of-message signal 
that is sent for free. 

Definition 1. A two agent Communication Algorithm A (also called protocol) 
for a decision problem (i.e., a function f : {0, 1}* x {0, 1}* — >■ {accept, reject} ) 
is given by 3 functions pc, Pn ■ {0, 1}* x {0, 1}* — >■ {0, 1, accept, reject} and 
t:{0, 1}*^{C,TZ}. 

When the input is (X,Y), and so far the string w € {0, 1}* has been com- 
municated (initially, the empty string has been communicated), then 

— if t{w) = C and pc{X,w) G {0, 1} then agent £ sends bit pc{X,w) to TZ 
( and concatenates this bit to w ), 

— if t{w) = TZ and pti{Y,w) G {0, 1} then agent TZ sends bit pti{Y,w) to £ 
(and concatenates this bit to w), 

~ if Pt{w) ^ {Oj 1} then the communication ends with output 
f{X,Y)=t{w). 

The function t tells whose turn it is (to send a message or to stop). Both 
agents know the communicated string w and thus t{w). The agent whose turn 
it is, may stop accepting or rejecting or send a bit to the other agent. Maximal 
consecutive bits sent by the same agent are called a message. Intuitively, the 
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algorithm stops when one agent knows the output while the other agent waits 
for a message. 

The length of a communication is the length |r<;| of the communicated string 
w when Algorithm A stops. We write w = A{X,Y), as w is the function of 
{X, Y) computed by Algorithm A. 

Note that as we are using a non-uniform computation model, there really is 
a separate algorithm for every input size n. 

This definition truly counts the number of bits sent, because the information 
about the end of a message is contained in the message itself. Note that both 
agents can compute the function t telling whose turn it is to send the next bit. 

Definition 2. For a given communication algorithm A, the deterministic com- 
munication complexity is defined by 

C'i)et[A](n) = maxllicl : w = A{X,Y) A size{X,Y) = n} 

For a function f, the deterministic communication complexity is defined by 

Coetin) = min CDet[A]{n) 

where A ranges over all communication algorithms computing f. 

Communication algorithms defined so far are also called “deterministic com- 
munication algorithms.” 

In a Las Vegas communication algorithm for /, the agents are allowed to 
toss coins. Nevertheless, the computed function has to be / for every outcome 
of the coin tosses. In this case, the communication complexity is the expected 
number of bits transmitted (expectation over the sequence of coin tosses) . Here, 
we only consider ideal coins. Furthermore, we restrict ourselves to randomized 
algorithms where the number of coin tosses only depends on the input size, and 
both agents want to know the values of all random bits. Thus w.l.o.g., all coin 
tosses are done at the beginning by agent £ who sends the outcome to TZ. 

Our assumption that all coin tosses are done at the beginning, implies that 
a Las Vegas algorithm A (for some input size n) really consist of a collection of 
deterministic algorithms Ar (one for each sequence of coin tosses r G {0, 1}'^). 

Definition 3. A standard Las Vegas communication algorithm using R = R{n) 
random bits is defined by 2^^”^ deterministic communication algorithms A^ 
(r € {0, for every input size n. It starts with agent £ tossing R{n) 

coins to obtain the string r and sending r to agent TZ. Then the agents simu- 
late Algorithm Ar without any further coin tosses. Furthermore, the complete 
algorithm is required to be Las Vegas, i.e., to produce the same output for every 
random sequence r. 

Our Las Vegas communication algorithms are standard, while for lower 
bounds, also algorithms compete where both agents may toss coins at any time. 
Thus for a given standard Las Vegas communication algorithm A, using i?(n) 
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random bits, the Las Vegas communication complexity C^as vegas i-^] is defined 

by 

^Las Vegas [^](^) = + 2 i?(n) ^ Coet [^Ir] (n) 

re{0, 1}K(") 

For given functions / and R, the Las Vegas communication complexity is 
defined by 

^Las Vegas (^) = Vegas I^K^) 

where A ranges over all Las Vegas communication algorithms computing / using 
i?(n) random bits. Finally, we define 

^Las Vegas (tl) = min (^Las Vegas (^) 
rC 



3 Results 

Various rank functions produce lower bounds for the communication complexity 
[8,4,5]. The rank function has been used to get the following lower bound 

for the Las Vegas communication complexity of LND with limited randomness. 

Theorem 1. (Lower Bound [5]) 

For 0<R< logn vegasi^ND) > n'^/2^. 

Actually, this lower bound holds for all non-negative integers R, but the 
result is trivial for R > log n, because every Las Vegas algorithm has to transfer 
at least n bits before accepting an input of LND. 

For R < log n, this lower bound has been partially matched by the following 
upper bound [5]. 



^£lsVegas(LND)=0(^) 

The best result implied by this theorem without a limit on the coin tosses is 

^Las vegas(LND) = O(nlogn) 

We want to present a communication algorithm A providing a better match 
for the lower bound. Not only is the communication algorithm A itself a ran- 
domized algorithm, but also the selection of A is done with a stochastic process. 
We just want to show the existence of a good algorithm. This is guaranteed as 
soon as the probability for the stochastic process to deliver a good algorithm is 
positive. 

If the stochastic process did not guarantee anything better, one might rightly 
be hesitant to use such an algorithm A. Fortunately, things are much better. First 
of all, the stochastically produced algorithm A is unconditionally correct, only 
its running time is in question. Furthermore, with high probability the expected 
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Parameter Selection: 

for (r, i) e {0, . . . , 2^ - 1} X {1, . . . , n} 

choose G Z 2 uniformly at random 
if is linearly dependent of z[^^^ , ■ ■ ■ , 

then z[’"'^ is replaced by any vector of Z 2 which is 
linearly independent of z ^^ , • • • , 

Fig. 3. The Randomized Parameter Selection Algorithm 



time complexity of the stochastically produced algorithm A does not exceed the 
expected time complexity of an optimal algorithm by more than a tiny fraction. 

We interpret the randomized selection of an algorithm A, as the random- 
ized selection of certain parameters of a single algorithm A. For each 

re {0,... ,2« - 1}, is an n X n {0, l}-matrix with being its fth 
row. The selection of these parameters is described in the Parameter Selection 
Algorithm of Figure 3. 

The basic idea of Algorithm A is quite simple. Instead of transmitting a 
column of X to compare it with column of T, we could just as well transmit 

the matrix product for a regular matrix and compare it with , 

but obviously, there would be no advantage in doing so. But if we replace the 
whole matrix Z^'^^ by just some rows of it, then we have a nice short hash value 
to transmit instead of the whole x^ . If y^ hashes to a different value (for the 
same hash function), then we have discovered a difference between x^ and y^ 
without transferring all of x^ . To reduce the chance of a collision of hash values 
for distinct x^ and y^ , we employ a randomized selection of the hash functions 
defined by the matrices Z^'^\ 

The Randomized Parameter Selection Algorithm (Figure 3) preselects for 
each r G {0, . . . , 2-^ — 1} a random sequence 

4’’^ (i = l,...,n) 

of vectors to hash the columns x^ of X and Y by scalar products. This initial 
random selection can be slightly improved by replacing useless vectors (linearly 
dependent on previous ones) 2 ) by good ones. 

With these parameters selected, the randomized algorithm A (Figure 4) starts 
by first tossing R coins to select rg{0,...,2^— 1}. It then proceeds determin- 
istically, transmitting for each column x^ a sequence of scalar products until a 
difference to column y^ is discovered. If instead, one pair of columns is found 
equal, the algorithm stops immediately. 

To obtain the precise constant factors as claimed in the theorem. Algorithm 
A is slightly optimized. C transmits several bits as one message, but receives a 
one bit answer indicating whether a difference between x^ and y^ has just been 
detected. Giving an answer after every bit is a good strategy for a random input, 
but giving an answer after several bits is a slightly better strategy for worst case 
inputs with respect to the previously selected parameters Z^'^\ 




An Improved Communication-Randomness Tradeoff 451 



Algorithm A: 

C tosses R coins to form the binary number 
re{ 0 ,... , 2 «-l} 

C sends r (to TZ) 
m 0 

for y 1 to n do 

for i •(— 1 to n do 
m m -I- 1 

C sends Wm (inner product of vectors) 

if then equal = 1 else equal = 0 

if i = n A equal = 1 then TZ halts accepting 
if j = n A equal = 0 then TZ halts rejecting 
m m + 1 

C receives the answer Wm equal (from TZ) 



Fig. 4. The Algorithm 



We use a Chernoff bound to obtain our upper bound. We use the version on 
page 70 in the book of Motwani and Raghavan [9] specialized to simple case of 
a sequence of Bernoulli trials. 

Theorem 2. (Chernoff Bound [3,9]) 

Let Xi,X 2 , ■ ■ ■ ,Xra he independent Bernoulli trials, such that, for 1 < i < m, 

Pr[Xi = 1] = p, where 0 < p < 1. Then, for X = T = 

and 0 < i5 < 1, 

Pr[X < (1 - 5)p] < 

We only use the case P = | implying p, = ^m. Furthermore, we let m = [anj 
for some constant a > 1 and {1 — S)p = n. 

Theorem 3. For m = \an\, let X\,X 2 , ■ ■ ■ ,Xm be independent Bernoulli tri- 
als, such that, for \ < i < m, Pr[Xi = 1] = |. Let X = Y[hLi Then 

Pr[X < 2^] < 2-” 

holds for a> [3 = 2(1 -|-ln2-|- -\/2 In 2 -h In^ 2) (which is < 6.11888^ and 2^ < n. 



Pr[X < 2^] < 2-” 

also holds for a > 7 = 41n2 (which is < 2.77259), R = 0(1) and n sufficiently 
large. 

Proof. We apply Theorem 2 with P = | (implying p = ^m), m = \an~\, and 
(1 - 5)p = 2^. Then 



2 R 2^+1 

5 = 1 = 1 



2^+1 

[cm] 



implying 



m 
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Pr[X < 2^]=Fr[X < (l + <5)/x] 

< exp(— /ii5^/2) 

= exp (— [an] (5^/4) 

< exp(— an<5^/4) 



Thus 



Pr[X < 2^] < 2-” 



if 



1 

4 



(loge)aJ^ > 1 



or equivalently 

-(loge)a I 1 - >1 
4 V M J 

In other words, the quadratic inequality in a 

_(l„ge)„[l-_j >1 

is sufficient to obtain 

Pr[X < 2^] < 2"” 



For this quadratic inequality in a, we are only interested in the solutions a 
with (5 > 0, i.e., 



2 « 



2^+1 

[an] 



These solutions are 



(2^ 

a ^ 2 I — T In 2 T 

\ n 




In 2 + In 




Hence, for 2^ < n and a > 2(1 + ln2 + i/2 In 2 + In^ 2) < 6.11888 (e.g., a = 7) 
is sufficient for all n. For constant i?, and n sufficiently large, any a > 41n2 
2.77259 (e.g., a = 2.8) is sufficient. 



Theorem 4. With R bits of randomness, the LND problem can be decided 
within communication complexity 



^Las Vegasi^) < R + O!-^ + 2 




for any a> j3 of Theorem 3. 



□ A 
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Proof. The first term is from sending the random bits, while the second term is 
a bound on the expected number of hash bits for a worst case input {X, Y) with 
disjoint lists. The third term represents the expected number of answering bits 
after selecting a message length of \J anj^^. An extra term for the case of non- 
disjoint lists can be avoided with a simple trick. With probability 1/2 (just reuse 
the first random bit) the left and right halves of the matrix X are swapped. The 
same is done with matrix Y . As a consequence, it is always expected to discover 
equal columns sufficiently early. □ 



4 Open Problems 

The main open problem in this area is the question whether there is a more 
intelligent combinatorial construction of the Las Vegas algorithm (parameter 
selection) than by random choice. Upper and lower bounds are already very 
close together. Nevertheless such a construction might decrease the constant 
factor gap between upper and lower bounds. More important, a better explicit 
construction seems like an interesting and challenging problem in combinatorics. 

Clearly for very small i? some improvement is possible. For i? = 1 an opti- 
mal choice of parameters is the following: is any sequence of basis 

vectors of ZJf, and z[^\ . . . , Zn'^ is the reverse sequence Zn\ . . . , z[^\ 

Another remaining open problem in this area is the extension of limited 
randomness results to Las Vegas bounded round communication complexity. 
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Abstract. We introduce distributed games over asynchronous tran- 
sition systems to model a distributed controller synthesis problem. A 
game involves two teams and is not turn-based: several players of both 
teams may simultaneously be enabled. We define distributed strategies 
based on the causal view that players have of the system. We reduce 
the problem of finding a winning distributed strategy with a given 
memory to finding a memoryless winning distributed strategy in a larger 
distributed game. We reduce the latter problem to finding a strategy 
in a classical 2-players game. This allows to transfer results from the 
sequential case to this distributed setting. 

Keywords. Distributed game, distributed control, distributed strategy. 



1 Introduction 

The controller synthesis problem has been widely investigated by many authors 
for different system types (sequential, concurrent, timed, probabilistic) and dif- 
ferent specification languages (linear or branching temporal logics for instance). 
The variant addressed here, called distributed controller synthesis problem, is 
the following. We are given a distributed reactive system executing a program 
in some environment, modeled by an asynchronous transition system [13] made 
up of several processes. It can perform local actions and synchronization actions 
involving several processes. Such an action first reads states of the participat- 
ing processes and, depending on what is read, chooses a transition changing 
the states of the involved processes. Interpreting processes as memory locations, 
this suggests communication via shared memory, and explains the terminology 
adopted in the paper. However, one can as well simulate with these asynchronous 
systems other communication paradigms, such as point-to-point channels. An- 
other advantage of this model is to handle actions of the environment just as 
actions of the reactive system. We are also given a specification, i.e., a prop- 
erty expressing behaviors one wants to ensure for the system. The distributed 

* Work partly supported by the European research project HPRN-CT-2002-00283 
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controller synthesis problem is then to compute a distributed controller on the 
same process set (actions of the controller observe local states of some processes). 
With this information, the controller has to enable or disable controllable ac- 
tions of the system, so that the overall system behaves correctly according to 
the specification. 

The problem is known to be decidable and to have an optimal solution in the 
sequential case [10], i.e., when there is a single process. For distributed systems 
however, the situation is more involved. Several models have been considered 
so far (formulated in control or in game theory terminology). For synchronous 
processes communicating via buffers, the problem is undecidable [9] for LTL 
specifications except for very few communication architectures. Recent works [5, 
4] extend this result, e.g., for local specifications (talking only of actions of single 
processes). The approach of [7] unifies [9,5,4]. In all settings, a major reason for 
undecidability is when the specification language makes it possible to express 
properties of an observed linearization of process actions, ignoring their possible 
concurrency. Another distributed model is studied in [6] , and it is shown that the 
existence of specific controllers is decidable for specification languages making 
no distinction between linearizations of the same concurrent execution. 

Systems used in this paper subsume those of [9, 5, 4, 7] and [6]. In [6], global 
transitions of the system are obtained by synchronizing local transitions of pro- 
cesses, and transitions of the environment are local. Here in contrast, a transition 
of a synchronizing action also depends on the states of other involved processes, 
so that transition functions of actions are not necessarily a cartesian product of 
local transition functions. Further, environment moves can be defined globally. 
Systems of [7], in which the environment is global and transitions of the system 
are purely local (process communication flows through the environment) can 
also be modeled naturally in our framework. Another difference is that [6,7] use 
local memory controllers, based on the history of process local states (cf. Sec. 5). 
We use causal memory: a controller can remember information collected from 
other processes along the computation. The existence of a distributed controller 
in the settings of [6,7] implies the existence of a controller in our setting. As 
the converse does not hold, one cannot transfer immediately the undecidability 
results of [6,7] to our case. 

Our primary goal is to model the distributed controller synthesis problem 
by games. Sequential 2-players games already provide a natural and widely used 
context to model sequential reactive systems [8,11,14,12]. Player 0 represents the 
system and player 1 represents the environment. The rules of the game describe 
the possible interactions between them, and the winning condition for player 0 
expresses the specification that the system should meet. Thus, deciding whether 
player 0 has a winning strategy corresponds to deciding whether the system can 
be controlled to meet the specification, and computing a winning strategy for 
player 0 corresponds to solving the controller synthesis problem. 

Distributed games proposed in this paper fit suitably to the model of asyn- 
chronous systems and supply a natural framework for studying the distributed 
controller problem. Two teams play one against the other. Players of team 0 may 
be viewed as controllable actions of a distributed system which cooperate in or- 
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der to meet the specification, no matter how the environment (team 1) behaves. 
All players use a pool of shared variables to transmit information. The game is 
not turn-based: in each position of the game, several players of team 0 or team 

1 may be simultaneously enabled. Thus, the game is asynchronous, contrary to 
the setting of [7] where at each stage players act synchronously. Assuming that 
dependencies between actions are fixed (z. e., do not depend on the context), a 
play is then a Mazurkiewicz trace. 

We next define the notion of distributed strategy for a team in such a game. 
Roughly speaking, a strategy is distributed if any move it predicts for a player 
only depends on the causal view of that player. In this context, there exist games 
in which neither team 0 nor team 1 have a winning distributed strategy. 

We first show that, as in the sequential framework, one can transform a game 
G for which team 0 has a distributed strategy with memory fj, into a game 
for which team 0 has a memoryless strategy. If G and the memory are finite, 
then so is G^. We further transform a distributed game G into a classical 2- 
players game G, such that team 0 has a memoryless distribute strategy in G 
if and only if player 0 has a memoryless winning strategy in G. This result is 
effective: G can be effectively constructed and from a winning strategy of G we 
can effectively construct a winning distributed strategy for G and vice versa. We 
then show that if the winning condition is a recognizable trace language, then 
one can decide whether team 0 has a memoryless distributed winning strategy, 
and compute it. The restriction to recognizable specifications is not artificial 
and makes it possible to express a relationship between the architecture of the 
game and the specification. Moreover, in practice, recognizable languages cover 
most interesting properties. As in [6,7], specifications depending on the order of 
independent actions lead to undecidability. 

Finally, we explain how to simulate distributed games of [7] in our context. 
The goals of the two kinds of games are quite different. The aim of [7] is to unify 
different approaches and to find generic transformations on distributed games 
{e.g., the reduction of the number of players) to get decidability results, while 
we first focused on a natural model which we then reduced to sequential games. 
Due to space constraints, proofs are omitted. 

2 Preliminaries 

In this section, we briefly recall definitions of pomsets and Mazurkiewicz traces. 
The reader is referred to [3,2] for details. 

If (V, <) is a poset and S' C M, we let IS' = {e G M ] 3s G S', e < s}. When 
e € V then we simply write |e for |{e} and we let JJ-e = |e \ {e}. The successor 
relation associated with the partial order < is < = < \ <^. 

A pomset over an alphabet A is a tuple (M, ^,£) where (V, <) is a poset, and 
£ : y — >■ A is a mapping called the labeling. Elements of V are called events or 
vertices. Two pomsets t = (A, ^,£), and t' = {V are isomorphic, written 
t ~ t' if there exists a bijection ip : V ^ V such that £' op = £ and for all e, / G A 
e ^ / iff p{e) p{f). If A = Ai X A 2 and £{e) = (£i(e), £ 2 ( 6 )), we write (£i,£ 2 ) 
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(or even £ 1 ,^ 2 ) instead of £. li t = (F, is a pomset, we denote by max(t) 
(resp. by min(t)) the set of maximal (resp. minimal) elements of t. The alphabet 
of t is alph(t) = £{V). We let alphinf(t) = |a G alph(t) | £~^{a) is infinite}. 

A dependence alphabet is a pair (A, D) where A is a finite alphabet and D 
is a reflexive, symmetric binary relation over S, called the dependence relation. 
We let I = A X A \ D be the independence relation. A (Mazurkiewicz) trace over 
(A,D) is an isomorphism class of a pomset (t^, such that, for all e, / G V: 
(1) £{e)D£{f) => e < / or / < e, (2) e</ => £{e)D£{f) and (3) le is finite. Two 
traces t, t' are independent if (alph(t) x alph(t')) flD = 0. We denote by R(A, D) 
(resp. by M(A, D)) the set of traces (resp. of finite traces) over (A, D). It is well- 
known that M(A,D) is a monoid. The free monoid (resp. the free semigroup) 
over A is denoted by A* (resp. by A+). 

A prefix of t = (A, ^,£) is a trace {U, ^,£), where [/ C A satisfies lU = U. 
We write s ^ t is s is a prefix of t. A linearization of t is a labeled total order 
(A, ^,f) such that e ^ / implies e ^ /. For any w G A*, there exists a unique 
trace [w] of which w is a linearization. 



3 Distributed Games 

A distributed system made up of asynchronous processes interacting together 
and with the environment may be viewed as a single asynchronous model having 
controllable actions (the system’s ones) and uncontrollable actions (the environ- 
ment’s ones). In the game setting, one views actions as players, which are split 
in two teams Aq (actions of the system) and Ai (actions of the environment). 
An execution of the system inside the environment corresponds then to a play, 
a property of the executions to a winning condition, and a distributed controller 
to a winning distributed strategy for team 0. 

If X and I are sets and J C I, then for x = (xi)ie/ G we let xj = 
G A'^. Given sets (Ai)je/, and J C I, we let Xj = WifzjXi- 
An architecture is a tuple (A, V, R, W) such that A is a finite set of actions or 
players, V is a finite set of processes, i? : A — >■ 2^ assigns to each a G A its read 
domain R{a), lA : A — >• 2^ assigns to each a G A its write domain IA(o). We 
only consider architectures satisfying the following natural restriction, already 
considered in [13], and sufficient to get a dependence alphabet on actions. 

Vo G A, 0 yf IA(a) C R{a) 

Va,bGS, i?(a) n IA(&) = 0 4=^ i?(6) n IA(a) = 0 

We define the dependence relation over A as D = {(a, 6) | R{a) fl IA(&) yf 0}. 

Let {S,P,R,W) be an architecture. A distributed game over {S,P,R,W) 
is given by a tuple G = (Aq, Ai, ((5i)jg-p, (Ta)aeSi 9°, W) where Aq and Ai are 
the players of teams 0 and 1 respectively and we have A = Aq W Ai, Vi G P, Qi 
is the set of local states for process i, Va G X, Ta C Qn(^a) x Qw{a) gives the 
local moves of player a, € Q = IlieP is the starting position of G, and W 
defines the winning condition of G. 
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The easiest way to define the semantics of the distributed game is via its 
sequential game graph whose set of positions is Q, the initial position is and 
there is an a-move from p € Q to q € Q (denoted p q) it {pR(a)i Qw(a)) G 
and q-p\w{a) = Pv\w(a)- A sequential play is a sequence q^ — b q^ q^ ■ ■ ■ . 
Note that in a position p G Q, several players of team 0 and of team 1 may be 
simultaneously enabled, hence this sequential game graph does not correspond 
to a conventional (sequential) game in which each position is either a position 
of player (team) 0 or a position of player (team) 1. 

We consider a new symbol J- ^ S with i?(_L) = W(_L) = V and the alphabet 
S' = {{a,p) \ a G S and p G Qw(a)} U {(J_,(7°)} with the dependence relation 
D' = {{(a,p),{b,q)) \ R{a) fl W{b) yf 0}. The winning condition of the game 
is a set of words W C S'oo which is closed under the usual trace equivalence 
(see [2,3]). With the sequential play tt = q^ ■ ■ ■ of G we associate 

the word w = (T, (7°)(oi, • • • over S'. Note that the word w 

faithfully encodes the sequential play tt and team 0 wins the play tt if m G W. 

A better semantics of these distributed games is to view a play directly as a 
rooted trace over the alphabet S'. A finite or infinite trace t = G 

R(S',D') is rooted if = {ccj_} is a singleton and xj_ ^ y for all y GV. It 

s = ([/, {£, a)) is a nonempty prefix of t then a{s) = ((t(s)i)igp G Q is defined 

by d{s)i = a{y)i where y is the maximal vertex ta {x GU \ i G W{f.{x))}. 

A distributed play of G is a finite or infinite rooted trace t = (V, ^ 
,{£,a)) G K(A’',D') such that for each a G S and x G we have 

((t(U.a;)/j(a), (r(a;)) G T^. The winning condition >V is now a subset of R(A',D') 
and team 0 wins the distributed play t it t G W. The two definitions above are 
indeed equivalent but the second one is better suited to distributed games and 
allows a natural definition of a distributed strategy. In sequential games, one 
often considers infinite plays only. We also consider finite plays because it can 
be more convenient. 

Let t = (IG, cr) G K( A, D) be a rooted trace and let J G-V. The trace djt 
is the prefix of t defined by the set of vertices U = |{a; G V \ W{^{x)) fl J yf 0}. 

An asynchronous mapping [1] is a function y : M(A,D) — >■ M such that 
yidAust) only depends on y{dAt) and y^dst), and y{dR(a)t-o) only depends 
on y{dR(a)t) and a. Asynchronous mappings can be computed by deterministic 
asynchronous transition systems. A distributed memory on a game G is an asyn- 
chronous mapping y : M(A',D') — >■ M. It will be used by players of team 0 as 
an abstraction (computed in M) of their causal view of the play. A distributed 
strategy with memory y for team 0 (^-DS) is a pair (/, y) where / is a partial 
function / : Uae^o x x {a} -)> Qw(a) such that if f{p,m,a) = q, 

then (p,q) G T„. Intuitively, if f{p,m,a) = q, then the strategy / dictates an 
a-move to <7 G Qw(a) when the memory of the play that a can observe using y 
is m G If f{p,m,a) is undefined, the a-move is disabled by the strategy. 

Note that several players of team 0 may be simultaneously enabled by / during 
a play. In the sequel we write / instead of (f,y), y being understood. 
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Let / be a Let /z(t) = (^(i9i(t)))ig-p. A distributed play t={V, ^,£,a) £ 
R(U',D') is an f-play if 

Vx G V, a(x) = f(a(i}.x)si(a),M(^x)si(a),a) 

The play t is /-maximal if f{a{dR(a)t)R{a),Ji{dR{a)t)R{a),a) is undefined for all 
a G Uq such that dR(a)t is finite. The maximality condition is natural: if the DS 
of team 0 dictates some a-moves at some /-play t, then the /-play t is not over 
and we do not have to decide whether it is winning or not for team 0. Note that 
this applies also if t is infinite and corresponds to some fairness condition: along 
an infinite /-play, a move of team 0 cannot be ultimately enabled by /. Observe 
that any /-play t is the prefix of some /-maximal /-play. If each /-maximal 
/-play is in W then / is a winning distributed strategy (WDS) for team 0. 

A distributed game is not necessarily determined in the sense that it is pos- 
sible that neither team 0 nor team 1 have a WDS, even with perfect memory. 
For instance, consider G = (Aq, Ai, ((5i)jgp, (Ta)aei:, 9°, W) with Aq = {a}, 
= {b}, V = {1, 2}, i?(a) = W{a) = {!}, R{b) = W{b) = {2}, Qi = Q 2 = {!}, 
Ta = Ql n = Ql q° = (1, 1), and W = M(A', D') U {(A, g°)(a, 1)“(6, 1)“}. As- 
sume that team 0 has a DS. If /((A, q°){a, I)”, a) yf 0 for all n ^ 0, then team 0 
loses if team 1 does not play at all, yielding the play (A, <7°) (a, 1)“. Conversely, 
if there exists n ^ 0 such that /((A, q^){a, 1)”, a) = 0, then team 0 loses if team 
1 makes infinitely many moves. Symmetrically, team 1 does not have a WDS. 

Actually, this non-determinacy is not a problem. For the distributed con- 
trol problem, we are looking for a WDS allowing controllable events (team 0) 
to enforce good behaviors but we are not interested in a winning distributed 
strategy for the uncontrollable events: uncontrollable events are played by an 
environment, and there is no reason to consider only distributed environments. 

A memoryless distributed strategy (MDS) is a /r-DS with |/x(M(A', D'))| = 1, 
that is, the memory does not record any information. In this case, we write 
f{p,a) instead of f{p,m,a). A perfect-memory distributed strategy is a /r-DS 
with pi{t) = t. It provides for a move to x with £{x) = a the full causal view 
= (ddx)i^R(^a)- Since can be computed from 

one can drop the state component in / and write /(to, a) instead of f{p,m,a). 
As in the sequential case, one can embed a given memory into the game. 

Proposition 1. Let G he a distributed game and let p, he a distributed memory 
on G. One can construct a distributed game G^ such that there exists a yi- WDS 
for G iff there exists a WMDS in G^. Moreover, if G is finite and p. is realized 
by a finite asynchronous automaton, then G^ is finite. 

4 Global Game 

In order to use known results of game theory, we want to define a classical two- 
players global game G = (Z,T) such that team 0 has a WMDS in the distributed 
game G iff player 0 has a winning memoryless strategy in the global game G. 
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The positions of the global game are Z = Zq\J Z\ where Zq = Q y. Sq are the 
positions of player 0 and Zi = Q x {Ei U {0, 1, 2}) are the positions of player 1. 
The initial position is € Zi. In a position (q,a), the first component 

describes the current global state of the play and the second component is used 
both to determine whose turn it is and which action should be executed. The 
set T C (Zq X Zi) U (Zi x Z) of moves is defined as follows: 

— (p, 6) — > (p, a) with b G {0,1,2} and a £ S. Player 1 decides that the 
next move should be an a-move. In this global game, player 1 is in charge 
of deciding which actions are used and in which order. This allows him to 
investigate all possible linearizations of distributed plays. 

— (p,a) (g, 1) with a€S, {pR(a),qw{a)) e Ta and q-p\w{a) = Pv\w(a)- This 

a-move is executed by player 0 or player 1 depending on whether a € Sq or 
a G Zi . 

~ (p, a) — > (p, 2) with a € Sq. Player 0 refuses to make an a-move. 

— (q,b) — > (g°,0) with b G (0, 1,2}. These reset-moves are used by player 1 to 
show that player 0 is not following a distributed strategy. 

Note that player 1 may perform several consecutive moves. 

A global play is a finite or infinite sequence z = ZqZiZ 2 • • • G Z°° starting from 
the initial position Zq = (g°, 0) and such that z„ — > Zn+i is a move for all n ^ 0. 
Let z = Z0Z1Z2 • • • G Z°° be a global play and let z„ = (g”, a„) G Z for n ^ 0. 
We define by induction the sequence (t„)„j>o G M(A'',D')^ associated with z. 
If z„ = (<z°,0) then = (±,(7°). If a„+i G A U {2} then tn+i = tn- Finally, if 
Un = a G Z and a„+i = 1 then • (a, We prove by induction that 

tn is a distributed play and a{tn) = for all n ^ 0. The only non trivial case 
is when a„ = a G Z and a„+i = 1. By induction, is a distributed play and 
^{tn) = g”. We have G Tg and = a{dR(g)tn)R(a)- Therefore, 

tn+i is a distributed play and using = Qv\w{a) S®* ^{tn+i) = 

The global play z is consistent if for all j,k ^ Q with aj = ak = a G Zq 
^R ia) = 9fl(a) have Gj+i = Ofc+i and = q^^gy The global play 

z is fair if (n ^ 0 | a„ = 0} is finite and for all a G Zq, (n ^ 0 | a„ = a} is 
infinite. If z is both consistent and fair then we let N{z) = maxjn | a„ = 0}. The 
sequence {tn)n^N(z) is increasing and admits a least upper bound t{z) which is 
a distributed play of G. _ _ 

The winning condition W of G only involves infinite plays z G Z“ . If z is not 
consistent then player 0 loses the game since this reveals that he does not mimic 
a memoryless distributed strategy. If z is not fair then player 1 loses the game. 
Finally, if z is both consistent and fair then player 0 wins the game iff t{z) G W. 

A (global) strategy (S) for player 0 in G is a mapping g : Z* Zq ^ Zi such 
that g{z{p, a)) = (g, b) implies (p, a) — >■ (q, b). A global play z = zqZiZ 2 • • • G Z°° 
is played according to g (p-play) if each move of player 0 is done according to 
g: Zk G Zq implies Zk+i = g(zg ■ ■ ■ Zk)- If player 0 wins all infinite p-plays then 
p is a winning strategy (WS) for player 0. A strategy p is memoryless if for all 
x,x' G Z* and y G Zq, we have g{xy) = g{x'y). We write MS and WMS for 
memoryless strategy and winning memoryless strategy. 
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We can now state the main result of this section. 

Theorem 1. The following conditions are equivalent for a distributed game G: 

1. There exists a WMDS for team 0 in the distributed game G. 

2. There exists a WMS for player 0 in the global gam^G. 

3. There exists a WS for player 0 in the global game G. 

The following proposition gives the construction used for the implication (1 2). 



Proposition 2. Let f be a deterministic WMDS for team 0 in G. For {p,a) £ 
Zo, we define g{{p,a)) = (p, 2) if f{pR(a), a) = 0 and g{{p,a)) = (g, 1) with 
qv\w(a) =Pv\w[a) and f{pi^a),a) = {qw(a)} otherwise. Then, g is a WMS for 
player 0 in the global game G. 

To prove the implication (2 1) of Theorem 1, we exploit reset-moves. 

Lemma 1. Let g be a WMS of player 0 in the global game G. Let (p^,a) € Zq 
and {p^,a) € Zg be accessible in g-plays and such that = p\(^a)- Then, 

g{p^,a) = (p\2) iff g{p^, a) = (p^,2) and if g{p'^,a) = (g\ 1) and g{p^,a) = 
(g2,l) then 

Using this lemma, we can now transform a WMS in G into a WMDS in G. 

Proposition 3. Let g be a WMS of player 0 in the global game G. For 
(p,a) G Zq accessible by a g-play, we define f{pR(a),a) = 0 if g{p, a) = (p, 2) 
and f{p,a) = {qw(a)} if g{p,a) = (g, 1). Then, f is a WMDS of team 0 in the 
distributed game G. 

Even if W is rational (W = [L], where L G Rat(I7*)), determining if team 0 
has a W(M)DS is undecidable. Indeed, on M(T’, D) = T* x B*, determining if a 
rational trace language £ is [U*] is undecidable [2]. 

From such a language C, we construct a 2-processes game in which team 0 
has a WMDS iff £ = [27*]: Uq = 0, = T W B, R{a) = W{a) = {1} for a G A 

and R{b) = W{b) = {2} for b £ B. Finally, \Q\ = 1 (so that we identify S' and 
27), and W = £ U (R(27, D) \ M(27,D)). Players of team 1 nondeterministically 
choose a move in some finite local game, so that any possible trace is a play. 
Now, team 0 has a WMDS iff he has a WDS iff team 1 cannot generate a finite 
trace outside £, that is, iff £ = [27*]. 

We now explain how to use Theorem 1 to decide if team 0 has a WDS. Denote 
by Lin(t) the set of all linearizations of t G K(27,D). Properties considered in 
practice are recognizable (z.e., Lin(W) is rational), and as noted in [6], there are 
many temporal logics expressing only recognizable specifications. To determine 
whether team 0 has a WMDS in G with W recognizable, we could enumerate 
all memoryless distributed strategies for player 0, and check whether one is win- 
ning. This amounts to testing an inclusion between recognizable trace languages. 
Theorem 1 provides a better algorithm. The principle is to build the global game 
G, to transform it into a parity game and to apply known algorithms. 
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Let ^ be a parity automaton accepting Lin(W). The winning condition W 
for player 0 on the global game G can be defined by W = Wc H (Lin(>V) UWnf), 
where VVc = {z G | z is consistent}, and Wnf = {z & \ z is not fair}. We 

describe informally how to construct a parity automaton accepting W. One can 
build a Biichi automaton accepting consistent plays in Z^\ it records all transi- 
tions {p R(a) 1 <lw (a)) performed by player 0, as well as the refused transitions. It 
falls into the unique rejecting state as soon as an inconsistent move is detected. 
One can also build a parity automaton checking that a play, supposed consistent, 
is not fair, by checking that there is an infinite number of reset moves (a Biichi 
condition) or that, for some a G Hqi there is only a finite number of states of the 
form (p, a) (co-Biichi conditions) . From the parity automaton A and from these 
automata, it should now be clear how to build a parity automaton for W. 

Observe that using Proposition 1, one can also determine whether team 0 
has a /i-WDS for a given finite distributed memory p. 



5 Related Approaches 

A distributed game G = (P, E, Tr, Acc, g°) as in [7] is built from n local games 
Gi, . . . , G„, where Gi = {Pi, Ei, Tr^, with Tr^ C (Pj x Ei). Positions of the 
environment are E = Ei . The position set of the players is P = U,{p,^e,)\e. 

Transitions of the players are defined with a cartesian product: Tr^ = (a(TDU 
Ai)) n (P X P) where Ai = {{xi, Xi) \ Xi G Ei} is the diagonal. Transitions of the 
environment are simply given by a subset Trg of P x P, and Tr = Tre l±l Tr^. A 
play of G starts in position G P, and moves from the environment and from 
the players alternate. Hence, any infinite play is in (P • P)“, and the winning 
condition is a subset Acc C (P • P)“. 

There is a natural translation from these games to our setting. With G, we 
associate the distributed game G = (Po,^i) {Ta)a&s,q° ,yV) as follows. The set 
of processes is P = {1, . . . , n} and the local states are Qi = Pi U Ei for i G P. 
Team 0 is defined by Pq = {!> ■ • ■ with R{i) = W{i) = {i} for all i G Sq. The 
transitions for player i are simply p = Tr^. Team 1 consists of a single player e 
(the environment) with R{e) = W{e) = P. Its transitions are Pg = Trfl (P x P). 

To define the winning condition W, we associate with each infinite play w = 
e^x^e^ • • • G (P • P)‘^ of G with e° = a distributed play trace(w) G K(P',D') 
as the least upper bound of (tdijso where the increasing sequence (t„)„^o is 
defined inductively by to = (J-,g°) and t„+i = tn - {e,x^+'^) 

Finally, the winning condition is W = trace(Acc) U {t G M(P',D') | a{t) G P}. 

The distributed games of [7] are thus a special case of our games in which 
all players of team 0 are completely local and the environment consists of a 
single global player. Note that, in the game G, information between players can 
only flow through environment moves and the environment can decide which 
information is exchanged between players. 

However, the crucial difference between games of [7] and the ones presented 
here concerns the definition of strategies. In [7], a strategy is a tuple of mappings 
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fi : {EiPi)'^ — >■ Ei- Hence a move of player i only depends on its local view 
consisting of the history on its process only. In our setting, the strategy of player 
i which is the mapping /(— , i) is based on its causal view Since actions of 

the environment are global, the causal view is almost the complete global view 
of the game. It is therefore clear that if there exists a WS for the players in G, 
then there exists a WDS for team 0 in G. The converse is false: it is easy to find 
a game G not determined while there is a winning strategy for team 0 in G. Yet, 
one can prove that if a game G of [7] is determined, then there is a WDS for 
team 0 in G iff players have a WS in G. 

If we want to get an equivalence, we have to restrict the memory used by our 
strategies to the local view, that is, to change the notion of memory used by the 
strategies. The i-projection of t G M(I7, D') is Ui{t) where Ui is the morphism 
from M(Y', D') to Q* defined by q) = qt if x = e or x = i and 77i(x, q) = s 
otherwise. Since player i is only aware of move he takes part in, we have to 
abstract away from unobservable stuttering of the environment. For this we use 
the congruence on Q* generated by pf = Pi for pi G Ei and we write w= for the 
equivalence class of w G Q*. We say that a distributed strategy / is local if for 
all i G Eq, depends only on i and 

Proposition 4. The players have a WS in G iff team 0 has a local WDS in G. 

In the distributed control problem presented in [6], the environment performs 
local moves and the transitions of a controllable action is defined by a cartesian 
product of local transition functions. It is therefore straightforward to translate 
as above these games to our framework. Since local strategies are used in [6], 
Propositions 4 also holds. 
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Abstract. The unified property specifies that a comparison-based se- 
arch structure can quickly find an element nearby a recently accessed 
element. lacono [lacOl] introduced this property and developed a static 
search structure that achieves the bound. We present a dynamic search 
structure that achieves the unified property and that is simpler than 
lacono’s structure. Among all comparison-based dynamic search struc- 
tures, our structure has the best proved bound on running time. 



1 Introduction 

The classic splay conjecture says that the amortized performance of splay trees 
[ST85] is within a constant factor of the optimal dynamic binary search tree for 
any given request sequence. This conjecture has motivated the study of sublo- 
garithmic time bounds that capture the performance of splay trees and other 
comparison-based data structures. For example, it is known that the perfor- 
mance of splay trees satisfies the following two upper bounds. The working-set 
bound [ST85] says roughly that recently accessed elements are cheap to access 
again. The dynamic-finger hound [CMSS00,Col00] says roughly that it is cheap 
to access an element that is nearby the previously accessed element. These bo- 
unds are incomparable: one does not imply the other. For example, the access 
sequence 1, n, 1, n, 1, n, . . . has a small working-set bound (constant amortized 
time per access) because each accessed element was accessed just two time units 
ago. In contrast, for this sequence the dynamic-finger bound is large (logarith- 
mic time per access) because each accessed element has rank distance n — 1 
from the previously accessed element. On the other hand, the access sequence 
1, 2, . . . , n, 1, 2, . . . , n, . . . has a small dynamic-finger bound because most acces- 
sed elements have rank distance 1 to the previously accessed element, whereas 
it has a large working-set bound because each accessed element was accessed n 
time units ago. 

In SODA 2001, lacono [lacOI] proposed a unified bound (defined below) that 
is strictly stronger than all other proved bounds about comparison-based struc- 
tures. Roughly, the unified bound says that it is cheap to access an element 
that is nearby a recently accessed element. For example, the access sequence 
1,| -I- 1,2, 1 -1-2,3, 1 -1-3,... has a small unified bound because most accessed 
elements have rank distance 1 to the element accessed two time units ago, whe- 
reas it has large working-set and dynamic- finger bounds. It remains open whether 
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splay trees satisfy the unified bound. However, lacono [lacOl] developed the uni- 
fied structure which attains the unified bound. Among all comparison-based data 
structures, this structure has the best proved bound on running time. 

The only shortcomings of the unified data structure are that it is static (keys 
cannot be inserted or deleted) , and that both the algorithms and the analysis are 
complicated. We improve on all of these shortcomings with a simple dynamic 
unified structure. Among all comparison-based dynamic data structures, our 
structure has the best proved bound on running time. 

2 Unified Property 

Our goal is to maintain a dynamic set of elements from a totally ordered universe 
in the (unit-cost) comparison model on a pointer machine. Consider a sequence 
of m operations — insertions, deletions, and searches — where the ith operation 
involves element Xi. Let Si denote the set of elements in the structure just 
before operation i (at time i). Define the working-set number tfiz) of an element 
z at time i to be the number of distinct elements accessed since the last access 
to z and prior to time i, including z. Define the rank distance di{x,y) between 
elements x and y at time i to be the number of distinct elements in Si that fall 
between x and y in rank order. A data structure has the unified property if the 
amortized cost of operation i is 0(lgminj,gsJU(y) -|- dfixi^y) + 2]), the unified 
bound. Intuitively, the unified bound for accessing an element Xi is small if any 
element y is nearby x in both time and space. 

3 New Unified Structure 

In this section, we develop our dynamic unified structure which establishes the 
following theorem: 

Theorem 1. There is a dynamic data structure in the comparison model on a 
pointer machine that supports insertions and searches within the unified bound 
and supports deletions within the unified bound plus 0(lglg|S'i|) time (amorti- 
zed). 

An interesting open problem is to attain the unified bound for all three 
operations simultaneously. 

3.1 Data Structure 

The bulk of our unified structure consists of 6>(lglg |S'i|) balanced search trees 
and linked lists whose sizes increase doubly exponentially; see Fig. 1. Each tree 
Tfc, fc > 0, stores between 2^*" and 2^*”^^ — 1 elements, ordered by their rank, 
except that the last tree may have fewer elements. We can store each tree Tk 
using any balanced search tree structure supporting insertions, deletions, and 
searches in 0(lg \Tk\) time, e.g., B-trees [BM72]. List Lk stores exactly the same 




468 



M. Badoiu and E.D. Demaine 




Fig. 1. Overview of our dynamic unified structure. In addition to a single finger search 
tree storing all elements in the dynamic set Si, there are £+1 = 0(lglg |Si|) balanced 
search trees and lists whose sizes grow doubly exponentially. (As drawn, the heights 
accurately double from left to right.) 



elements stored in Tk, but ordered by the time of access. We store pointers 
between corresponding nodes in Tk and Lk- 

Each element x may be stored by several nodes in various trees Tk , or possibly 
none at all, but x appears at most once in each tree Tk- Each tree node storing 
element x represents an access to a: at a particular time. At most one tree node 
represents each access to x, and some accesses to x have no corresponding tree 
node. We maintain the invariant that the access times of nodes in tree Tk are all 
more recent than access times of nodes in tree Tk+\. Thus, the concatenation of 
corresponding nodes in lists Lq, Li, L 2 , • ■ • is also ordered by access time. 

Our unified structure also stores a single finger search tree containing all 
n elements. We can use any finger search tree structure supporting insertions, 
deletions, and searches within rank distance r of a previously located element in 
0(lg(r + 2)) amortized time, e.g., level-linked B-trees [BT80]. Each node in tree 
Tk stores a pointer to the unique node in this finger search tree corresponding 
to the stored element.^ 

^ In fact, because nodes in the finger search tree may move (e.g., from a B-tree split), 
each node in tree Tk stores a pointer to an indirect node, and each indirect node 
is connected by pointers to the corresponding node in the finger search tree. The 
former pointers never change, and the latter pointers can be easily maintained when 
nodes move in the finger search tree. 
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3.2 Search 

Up to constant factors, the unified property requires us to find an element x = Xi 
in 0(2^) time if it is within rank distance 2^ of an element y with working-set 
number U(y) < 2^ . We maintain the invariant that all such elements x are 
within rank distance 3-2^ of some element y' in Tq U Ti U • • • U T^,. (This 
invariant is proved below in Lemma 1.) 

At a high level, then, our search algorithm will investigate the elements in 
To,Ti,...,Tk and, for each such element, search among the elements within 
rank distance 3 • 2^ for the query element x. The algorithm cannot perform this 
procedure exactly, because it does not know k. Thus we perform the procedure 
for each k = 0,1,2,... until success. To avoid repeated searching around the 
elements in Tj, j < k, we maintain the two elements so far encountered among 
these Tj’s that are closest to the target x, and just search around those two 
elements. If any of the searches from any of the elements would be successful, 
one of these two searches will be successful. 

More precisely, our algorithm to search for an element x proceeds as shown 
in Algorithm 1. The variables L and U store pointers to elements in the finger 
search tree such that L < x < U. These variables represent the tightest known 
bounds on x among elements that we have located in the finger search tree as 
predecessors and successors of x in Tq , Ti , . . . , . In each round, we search for 

X in the next tree T^,, and update L and/or U if we find elements closer to x. 
Then we search for x in the finger search tree within rank distance 3-2^ of L 
and U. 

Thus, if X is within rank distance 3-2^ of an element in Tq U Ti U • • • U T^, 
then the search algorithm will complete in round k. The total running time of k 
rounds is 1^*1) “ 0(2^). Thus, the search algorithm attains the unified 

bound, provided we have the invariant in Lemma 1 below. 

When the search algorithm finds x, it records this most recent access by 
inserting a node storing x into the smallest tree Tq. This insertion may cause Tq 
to grow too large, triggering the overflow algorithm described next. 



3.3 Overflow 

It remains to describe what we do when a tree becomes too full; see Algo- 
rithm 2. The main idea is to promote all but the most recent 2^ elements from 
Tfc to Tfc+i, by repeated insertion into Tk+i and deletion from T^. In addition, we 
discard elements that would be promoted but are within 2^ of other promo- 
ted elements. Such discards are necessary to prevent excessive overflows in the 
future. The intuition of why discards do not substantially slow future searches is 
that, for the purposes of searching for an element x within rank distance 2^ 
of elements in Tk+i-, h is redundant up to a factor of 2 to have more than one 
element in within a rank range of 2^ . This intuition is formalized by the 

following lemma: 
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Algorithm 1. Searching for an element x. 

• Initialize L < oo and U oo. 

• For fc = 0,1,2,...:“ 

1. Search for x in Tk to obtain two ele- 
ments Lfc and Uk in Tk such that Lk < 
X < Uk- 

2. Update L •(— max{L, Li,} and U <— 
min{[7, Uk}- 

3. Finger search for x within the rank 
ranges [L, 1/ -I- 3 -2^ ] and [t/— 3-2^ ,U\- 

4. If we find x in the finger search tree: 

a) Insert x into tree To and at the 

front of list To, unless x is already 
in To. ^ 

b) If To is too full (storing 2^ ele- 
ments), overflow To as described 
in Algorithm 2. 

c) Return a pointer to x in the finger 
search tree. 

“ If we reach a k for which Tk does not 
exist, then k = 0(lglgn) and we can 
afford to search in the global finger tree. 



Algorithm 2. Overflowing a tree Tk- 

1. Remove the 2 most recently accessed 
elements from list Lk and tree Tk- 

2. Build a balanced search tree T'k and a 
list L'k on these 2^ elements. 

3. For each remaining element z'mTk, in 
rank order, if the predecessor of 2 in Tk 
(among the elements not yet deleted) 

ofe+l 

is within rank distance 2 of z, then 
delete 2 from Tk and Lk- 

4. For each remaining element 2 in Ta,, in 
access-time order: 

a) Search for 2 in Tk+i- 

b) If found, remove 2 from Lk+i- 

c) Otherwise, insert 2 into Tk+i- 

5. Concatenate Lk and Tfc+i to form a 
new list Lk+i- 

6. Replace Lk L}; Tk <- Tj,- 

7. If Tfc+i is now too full (stores at least 

ofe + 2 

2 elements), recursively overflow 
Tk+i- 



Lemma 1. All elements within rank distance 2^ of an element y with working- 
set number ti{y) < 2^ are within rank distance 3-2^ of some element y' in 



ToUTiU---UTfe. 



Proof We track the evolution of y or a nearby element from when it was last 
accessed and inserted into Tq, to when it moved to Ti, T 2 , and so on, until 
access i- If the tracked element y' is ever discarded from some tree Tj, we continue 
by tracking the promoted element within rank distance 2^^^ of y' - The tracked 
element y' makes monotone progress through To,Ti,T 2 , . . . because, even if y' 
is accessed and inserted into Tq, the tracked node storing y' is not deleted. 
The tracked node also cannot expire from Tk (and get promoted or discarded), 
because at most 2^ distinct elements have been accessed in the time window 
under consideration, so y' must be among the first 2^ elements in the list Lk 
when it reaches Lk- Therefore, y' remains within rank distance 2^ -|- 2^ -|- 

• • • -I- 2^ -1-2^ of y, so we obtain the stronger bound that all elements within 
rank distance 2^ of y are within rank distance 2-2^ -1-2^ -h • • • -I- 2^ -1-2^ 
of an element y' in Tq U Ti U • • • U Tk- 

3.4 Overflow Analysis 

To analyze the amortized cost of the overflow algorithm, we consider the cost of 
overflowing Tk into Tk+i for each k separately. To be sure that we do not charge 
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to the same target for multiple k, we introduce the notion of a coin Ck which can 
be used to pay for one node to overflow from T^. to Ti^+i as well as for the node 
to later be discarded from T^+i. A coin Ck cannot pay for overflows or discards 
at different levels. We assign coin Ck an intrinsic value of 0(2^) time units, but 
twice what is required to pay for a node to overflow and be discarded, so that 
whenever paying with a coin Ck we are also left with a fractional coin ^Cfc. 

For each k, we consider the time interval after the previous overflow from 
into Tfc_|_i up to the next overflow from Tk into T^+i. At the beginning of this 
time interval, we have just completed an overflow involving 0(2^ ) elements 

each with a Ck coin. From the use of these coins we obtain fractional coins of 
half the value, which we can combine to give two whole coins to every node of 
To, Ti, . . . , Tfe, because there are only = 0(2^*”) = such nodes. 

Consider a search for an element x during the time interval between the 
previous overflow and the next overflow of T^. Suppose x was found at round 
i of the search algorithm. The cost of searching for x is 0(2^) time. We can 
therefore afford to give x two coins for each m < 1. We also award x with 
fractional coins (6/2^ )cm for each m > £, which have total worth o(l). We 
know that x was within rank distance 3-2^ of an element j/ in Tq U Ti U • • • U Ti. 
If £ < k, we assign y as the parent of x. (This assignment may later change if we 
search for x again.) 

Now consider each element x that gets promoted from Tk to Tk+i when T^ 
next overflows. If x has not been searched for since the previous overflow of T^, 
then it was in Tg U Ti U • • • U T^ right after the previous overview, so x has two 
coins Cfc. If the last search for x terminated in round £ with £ > k, then x also 
has two coins Cfc. In either of these cases, x uses one of its own Ck coins to pay 
for the cost of its overflow (and wastes the other Ck coin). 

If x is within rank distance 2^ ^ of a such an element y in Tg U Ti U • • • U T^ 
with two Cfc coins, then y must not be promoted during this overflow. For if 
y expires from Tfc during this overflow, then at most one of x and y can be 
promoted (whichever is larger), and we assumed it is x. Thus, x can steal one of 
y’s Cfc coins and use it for promotion. Furthermore, y can have a Cfc coin stolen 
at most twice, once by an element z < y and once by an element z > y, so we 
cannot over-steal. If y remains in TgUTiU - • -UTfc, its Cfc coins will be replenished 
after this overflow, so we also need not worry about y. 

If X has no nearby element y with two Cfc coins, we consider the chain connec- 
ting X to x’s parent to x’s grandparent, etc. Because every element without a 
Cfc coin has a parent, and because we already considered the case in which an 
element with a Cfc coin is within rank distance 2^ of x, the chain must extend 
so far as to reach an element with rank distance more than 2^ from x. Because 
every edge in the chain connects elements within rank distance 3 • 2^ , the chain 
must consist of at least 2^^^^/(3 • 2^*”) = 2^*"/3 elements within rank distance 
2^ ^ of X. Because each of these elements has a parent, they must have been 
searched for since the last overflow of Tfc , and were therefore assigned fractional 
coins of (6/2^ )cfc. As before, none of these elements could be promoted from 
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Tfc during this overflow because they are too close to the promoted element x. 
Thus, X can steal a fractional coin of (3/2^ )ck from each of these 2^ /3 ele- 
ments’ fractional (6/2^ )cfe coins. Again, this stealing can happen at most twice 
for each fractional (6/2^ )ck coin, so we do not over-steal. 

Therefore, a promoted element x from the overflow of Tk can And a full coin 
Ck to pay for its promotion. The 0(2*) cost of discarding an element x from Tk 
can be charged to the coin Ck-i that brought x there, or if fc = 0, to the search 
that brought x there. This concludes the amortized analysis of overflow. 

3.5 Insert 

To insert an element x, we first call a slight variation of the search algorithm 
from Section 3.2 to And where x fits in the Anger search tree. Specifically, we 
modify Step 4 to realize when the search has gone beyond x, at which point 
we can And the predecessor and successor of x in the Anger search tree. Then, 
as part of Step 4, we insert x at that position in the Anger search tree in 0(1) 
amortized time. We execute Steps 4(a-c) as before, inserting x into tree Tq and 
list Lq. 

Because this algorithm is almost identical to the search algorithm, we can use 
essentially the same analysis as in Section 3.4. More precisely, when we insert 
an element x, suppose we And where x flts during round i of the algorithm 
Then we assign x a parent as before, and award x two Cm coins for each m < £ 
and fractional coins (6/2^ )cm for each m > £. The only new concern in the 
amortized analysis is that the rank order changes by an insertion. Speciflcally, 
the rank distance between an element z and its parent y can increase by 1 because 
of an element x inserted between z and y. In this case, we set z’s parent to x 
immediately after the insertion, and the proof goes through. Thus, the amortized 
cost of an insertion is proportional to the amortized cost of the initial search. 

3.6 Delete 

Finally we describe how to delete an element within the unifled bound plus 
0(lglg|S'i|) time. Once we have found the element x to be deleted within the 
unifled bound via the search algorithm, we remove x from the Anger tree in 0(1) 
time and replace all instances of x in the T^’s with the successor or predecessor 
of X in the Anger tree. To support each replacement in 0(1) time, and obtain 
a total bound of 0(lglg |S'i|),^ we maintain a list of back pointers from each 
element in the Anger tree to the instances of that element as tree nodes in the 
Tfc’s. If more than one node in the same tree Tk ever points to the same element, 
we remove all but one of them. 

The amortized analysis is again similar to Section 3.4, requiring only the 
following changes. Whenever we delete an element x and replace all its instances 

^ We maintain the invariant that the number of trees Tk is at most 1 -|- Ig Ig | S'i | simply 
by removing a tree Tk if k becomes larger than Ig Ig | S'i | . Such trees are not necessary 
for achieving the unified bound during searches. 
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by its rank predecessor or successor y, element y inherits all of x’s coins and 
takes over all of x’s responsibilities in the analysis. We can even imagine x and 
y as both existing with equal rank, and handling their own responsibilities, with 
the additional advantage that if either one gets promoted the other one will 
be discarded (having the same rank) and hence need not be accounted for. An 
edge of a chain can only get shorter by this contraction in rank space, so the 
endpoints remain within rank distance 3-2^ as required in the analysis. The 
unified bound to access an element z may also go down because it is closer in 
rank space to some elements, but this property is captured by the removal of 
X in the finger tree, and hence finger searches are correspondingly faster. Each 
tree Tk might get smaller (if both x and y were in the same tree), requiring us 
to break the invariant that Tk stores at least 2^ elements. However, we use this 
invariant only in proving Lemma 1, which remains true because the working-set 
numbers tj(z) count accesses to deleted elements and hence do not change. 
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Abstract. We introduce here a rewrite system in the group of unimod- 
ular matrices, i.e., matrices with integer entries and with determinant 
equal to ±1. We use this rewrite system to precisely characterize the 
mechanism of the Caussian algorithm, that finds shortest vectors in a 
two-dimensional lattice given by any basis. Putting together the algo- 
rithmic of lattice reduction and the rewrite system theory, we propose a 
new worst-case analysis of the Gaussian algorithm. There is already an 
optimal worst-case bound for some variant of the Gaussian algorithm due 
to Vallee [16]. She used essentially geometric considerations. Our anal- 
ysis generalizes her result to the case of the usual Gaussian algorithm. 
An interesting point in our work is its possible (but not easy) general- 
ization to the same problem in higher dimensions, in order to exhibit 
a tight upper-bound for the number of iterations of LLL-like reduction 
algorithms in the worst case. Moreover, our method seems to work for 
analyzing other families of algorithms. As an illustration, the analysis of 
sorting algorithms are briefly developed in the last section of the paper. 



1 Introduction 

This paper deals with extracting worst-cases of some algorithms. Our method 
is originally proposed by the first author [1,2] as a possible approach for solving 
the difficult and still open problem of exhibiting worst-cases of lattice reduc- 
tion algorithms (LLL and its variants). Here the method is applied first to the 
Gaussian algorithm that solves the two-dimensional lattice problem and that is 
also intensively used by LLL-like algorithms when reducing higher-dimensional 
lattices. As another illustration of the method, three sorting algorithms (bubble, 
insertion and selection sorts) are also considered. In the sequel, we first briefly 
recall the problem of lattice reduction and our motivation to exhibit worst-cases 
of LLL-like algorithms. 

A Euclidean lattice is the set of all integer linear combinations of a set of 
linearly independent vectors in K^'. The independent vectors are called a basis 
of the lattice. Any lattice can be generated by many bases. All of them have 
the same cardinality, that is called the dimension of the lattice. If B and B' 
represent matrices of two bases of the same lattice in the canonical basis of 
then there is a unimodular matrix U such that B' = UB. A unimodular matrix 
is a matrix with integer entries and with determinant equal to ±1. 

The lattice basis reduction problem is to find bases with good Euclidean 
properties, that is, with sufficiently short and almost orthogonal vectors. 
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In two dimensions, the problem is solved by the Gaussian algorithm, that finds 
in any two-dimensional lattice, a basis formed with the shortest possible vectors. 
The worst-case complexity of Gauss’ algorithm (explained originally in the vo- 
cabulary of quadratic forms) was first studied by Lagarias [7], who showed that 
the algorithm is polynomial with respect to its input. The worst-case complexity 
of Gauss’ algorithm was also studied later more precisely by Vallee[16]. 

In 1982, Lenstra, Lenstra and Lovasz [10] gave a powerful approximation 
reduction algorithm for lattices of arbitrary dimension. Their famous algorithm, 
called LLL, was an important breakthrough to numerous theoretical and prac- 
tical problems in computational number theory and cryptography [13,6,8]. The 
LLL algorithm seems difficult to analyze precisely, both in the worst-case [2,9,10] 
and in average-case [1,3,4]. In particular when the dimension is higher than two, 
the problem of the real worst-case of the algorithm is completely open. However, 
LLL-like reduction algorithms are so widely used in practice that the analyzes 
are a real challenge, both from a theoretical and practical point of view. To finish 
this brief presentation, we recall that the LLL algorithm is a possible generaliza- 
tion of its 2-dimensional version, which is the Gaussian algorithm. Moreover the 
Gaussian algorithm is intensively used (as a black box) by the LLL algorithm. 

In this paper, we propose a new approach to the worst-case analyze of LLL- 
like lattice reduction algorithms. For the moment this approach is presented only 
in two dimensions. We have to observe here that the worst case of some variant of 
the Gaussian algorithm is already known: In [16], Vallee studied a variant of this 
algorithm whose elementary transforms are some integer matrices of determinant 
equal to 1. In the case of the usual Gaussian algorithm, elementary transforms 
are integer matrices of determinant either 1 or —1. Even if our paper generalizes 
[16] to the case of the usual Gaussian algorithm, we do not consider this as its 
most important point. Our aim here is to present our new approach. 

An LLL-like lattice reduction algorithm or a sorting algorithm uses some 
atomic transforms. In both cases, the monoid of finite sequences of atomic trans- 
forms is a group. A trace of execution of the algorithm is always a sequence of 
atomic transforms. But each such sequence is not necessarily a trace of the al- 
gorithm. We exhibit a family of rewriting rules over the group generated by the 
atomic transforms corresponding to the mechanism of the algorithm: The rewrit- 
ing rules make some sequences forbidden in the sense that possible executions 
will be exactly normal forms of the rewrite system. Thus the length of a valid 
word (or a normal form or a reduced word) over the set of generators, z. e., the 
number of atomic transforms that compose the word, becomes very close to the 
number of steps of the algorithm. In this paper, we present some rewrite sys- 
tems over GL 2 (Z) and over the permutations group, that make us predict how 
the Gaussian algorithm and some sorting algorithms are running on an arbitrary 
input. 

Then we consider the variation of the length of the input with respect to the 
length of the reduced word issued by the trace of the algorithm running on that 
input. In the case of the reduction algorithm, an input is a basis of a lattice. 
The length of an input is naturally related to the number of bits needed to store 
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it. For an input basis it is for instance the sum of the square of lengths of the 
basis’ vectors. We make appear inputs whose length is minimal among all inputs 
demanding a given number of steps to the algorithm. We deduce from this the 
worst-case configuration of the usual Gaussian algorithm and give an “optimal” 
bound for the number of steps. 

Let us explain this last point more precisely. Usually when counting the 
number of steps of an algorithm, one considers all inputs of length less than a 
fixed bound, say M . Then one estimates the maximum number of steps taken 
over all these inputs by: 



f{M) := max number of steps of the algorithm^, 

all inputs of length at most M 



( 1 ) 

Here to exhibit the precise real worst-case, we first proceed in “the opposite 
way” . Consider k a fixed number of steps. We will estimate the minimum length 
of those inputs demanding at least k steps to be processed by the algorithm: 



g{k) := min length of the input. (2) 

all inputs demanding at least k steps 

Clearly f{g{k)) = k. Otherwise there would be an input of length less than g{k) 
demanding more than k steps. But g{k) is by definition the minimal length of 
such inputs. So by inverting the function g, we can compute /. 

Plan of the paper. Section 2 introduces the Gaussian algorithm and outlines 
our method. Section 3 is the crucial point of our method: We identify all the 
executions of the Gaussian algorithm with normal forms of 4 rewrite systems. 
Section 4 exhibits some particular inputs whose length is minimal among the 
lengths of all inputs requiring at least k steps. Then we recall a result of [16] 
that estimates the length of the particular basis exhibited before and deduce an 
upper-bound for the maximal number of steps of the Gaussian algorithm with 
respect to the length of the input. Finally in Section 5 our method is briefly 
applied to three sorting algorithms: For each sorting algorithm, all possible ex- 
ecutions are identified with normal forms of a rewrite system. 



2 Gaussian Algorithm and the New Approach to Its 
Worst-Case Analysis 

Let be endowed with the usual scalar product ( , ) and Euclidean length 
juj = (u,u)^^^. A two-dimensional lattice is a discrete additive subgroup of 
Equivalently, it is the set of all integer linear combinations of two linearly 
independent vectors. Generally it is given by one of its bases (bi,b 2 ). Let (ei,e 2 ) 
be the canonical basis of We often associate to a lattice basis (bi, b 2 ) a matrix 
B, such that the vectors of the basis are the rows of the matrix:. 



ei 02 




^ When dealing with a non-trivial algorithm / is always an increasing function. 



(3) 
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The length I of the previous basis (or the length of the matrix B) is defined here 
by £{B) := jbip + jbap. 

The usual Gram-Schmidt orthogonalization process builds, in polynomial-time, 
fromabasis6= (bi,b2) an orthogonal basis b* = (b*,b2) and a lower-triangular 
matrix M that expresses the system b into the system 6*^. Let m be equal to 
. By construction, the following equalities hold: 



K = bi 

b2 = b2 — m bi 



and M = 



bi 

b2 





(4) 



The ordered basis B = (bi,b2) is called proper if the quantity m satisfies 



- 1/2 < TO < 1 / 2 . 



(5) 



There is a natural representative of all the bases of a given two-dimensional 
lattice. This basis is composed of two shortest vectors generating the whole 
lattice. It is called the Gauss-reduced basis and the Gaussian algorithm outputs 
this reduced basis running on any basis of the lattice. Any lattice basis in two 
dimensions can always be expressed as 



B = U R, 



( 6 ) 



where R is the so-called Gaussian reduced basis of the same lattice and U is 
a unimodular matrix, i.e., an element of GL2(^)- The goal of a reduction al- 
gorithm, the Gaussian algorithm in two dimensions, is to find R given B. The 
Gaussian algorithm is using two kinds of elementary transforms, explained in the 
sequel of this paper. Let (bi,b2) be an input basis of a lattice and the matrix 
B expressing (bi,b2) in the canonical basis of as specified by (3). 

The algorithm first makes an integer translation of b2 in the direction of bi 
in order to make b2 as short as possible. This is done just by computing the 
integer x nearest to to = (b2, bi)/(bi, bi) and replacing b2 by b2 — xbi. Notice 
that, after this integer translation, the basis (bi,b2) is proper. 

The second elementary transform is just the swap of the vectors bi and b2 
in case when after the integer translation we have |bi| > |b2|. The algorithm 
iterates these transforms, until after the translation, bi remains still smaller 
than b2, i.e., |bi| < |b2|. 

The Gaussian algorithm can also be regarded (especially for the analysis 
purposes) as an algorithm that gives a decomposition of the unimodular matrix 
U of relation (6) by means of some basic transforms: 



Input: B = U R. 

Output: R = T^>‘+^ST^>=ST^^-^ . . . ST^^ST^^B- 



where the matrix T corresponds to an integer translation of b2 in the direction 
of bi by one and the matrix S represents a swap: 



S = 



and 





(8) 



2 



Of course, b* is generally not a basis for the lattice generated by b. 
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Each step of the algorithm is indeed an integer translation followed by a swap, 
represented by^ ST^ , x G Z*. 

Writing the output as in (7) shows not only the output but how precisely the 
algorithm is working since T and S represent the only elementary transforms 
made during the execution of the Gaussian algorithm. 

So when studying the mechanism of a reduction algorithm in two dimensions 
and for a fixed reduced basis R, the algorithm can be regarded as a decomposi- 
tion algorithm over GL 2 CZ). The integer k+1 in (7) denotes the number of steps. 
Indeed the algorithm terminates [2,7,16]. The unimodular group in two dimen- 
sions GL 2 CZ) has been already studied [11,12,14,15] and it is well-known that 
{S', T} is a possible family of generators for GL 2 CZ). Of course there are relators 
associated to these generators and there is no uniqueness of the decomposition 
of an element of GL 2 CZ) in terms of S and T. But the Gaussian algorithm gives 
one precise of these possible decompositions. When the algorithm is running on 
an input U R, the decomposition of U could a priori depend on the reduced basis 
R. We will show that the decomposition of U does not depend strongly on the 
reduced basis R: The next fact divides all reduced bases of into 4 classes. 
Inside one fixed class of reduced bases the decomposition of U output by the 
Gaussian algorithm does not depend at all on the reduced basis R. 

Fact. Let R = (bi, b 2 ) be any reduced basis of Then one of the following 
cases occurs: 



bil 


< |b2| 


and 


m yf —1/2; 


(9) 


bij 


= b2 


and 


m yf —1/2; 


(10) 


bij 


< b2 


and 


m = —1/2; 


(11) 


|bi 


= b2 


and 


m = —1/2. 


(12) 



In the sequel we completely characterize the decomposition of unimodular 
matrix output by the Gaussian algorithm and we will call it the Gaussian de- 
composition of a unimodular matrix. Roughly speaking, we exhibit forbidden 
sequences of values for the x^-s. 

More precisely, we exhibit in Section 3 a set of rewriting rules that leads to 
the formulation output by the Gaussian algorithm, from any product of matrices 
involving S and T. The precise characterization of the Gaussian decomposition 
that we give makes appear the slowest manner the length of a unimodular matrix 
can grow with respect to its Gaussian decomposition: We consider unimodular 
matrices whose length of Gaussian decomposition is fixed, say k: 

U ■= + l grpXk-l 



We exhibit in Section 4 the Gaussian word of length k with minimal length. We 
naturally deduce the minimum length g{k) of all inputs demanding at least k 
steps. Finally by “inverting” the function g we find the maximum number of 
steps of the Gaussian algorithm. 

® A priori xi and Xk+i in (7) may be zero so the algorithm may start by a swap or 
end with a translation. 
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3 The Gaussian Decomposition of a Unimodular Matrix 

Let S he a, (finite or infinite) set. A word w on A" is a finite sequence ai «2 • • ■ On 
where n is a positive integer, and at G A, for all z G n}. Let A* be the 

set of finite words on A. We introduce for convenience the empty word and we 
denote it by 1. 

Consider the alphabet A = {S,T,T~^}. We recall that the Gaussian decom- 
position of a unimodular matrix U is the decomposition of U corresponding to 
the trace of the algorithm when running on an input basis B = UR where R 
is a reduced basis. In the sequel we show that there are at most 4 Gaussian 
decompositions for a unimodular matrix U. In the following subsections, 4 sets 
of rewriting rules depending on the form of R are given. Any word in which 
none of these rewriting rules can be applied is proven to be Gaussian. Since the 
results of these subsections are very similar, we only give sketches^ of proofs for 
Subsection 3.1. 

3.1 The Basis R Is Such That |bi| < |b2| and m ^ —1/2 

We say that a word w is a normal form or reduced word or a reduced decomposition 
of the unimodular matrix U, if a; is a decomposition of U in which none of the 
rewriting rules of Theorem 1 can be applied. 

Theorem 1 shows that the Gaussian decomposition and a reduced decom- 
position of a unimodular matrix are the same. By recalling that the Gaussian 
algorithm is deterministic, i.e., for an input basis B, there is a couple ([/, R) such 
that U and R are output by the Gaussian algorithm, the next theorem shows 
also that the reduced decomposition of a unimodular matrix is unique. 

Theorem 1. Let lo\ be any decomposition ofU in terms of the family of gener- 
ators {S', T}. The Gaussian decomposition ofU is obtained from oji by applying 
repeatedly the following set of rules: 





S2- 




(13) 




j^xj^y _ 


rjnx+y. 


(14) 


wx G zd, 


grj.2gj.X _ 


-G TST-^ST'^+i; 


(15) 


Vx G z;, 


st~^st^ - 


T-^ST^ST^-^] 


(16) 


Vx G Z*,Vfc G z. 


1 

4 _, STST^WSRy^ - 

i—k 

1 


1 

-G TST-^-^W^ST-y^', 

i—k 

1 


(17) 


Vx G Z*,V/c G Z+, 


ST~^ST^ JJ- grj^y^ _ 
i—k 


T~'^ST-^+^'[\ST-y\ 

i—k 


(18) 



Let us consider the Gaussian algorithm running on inputs UR where U is any 
unimodular matrix and R a reduced basis (bi,b 2 ) satisfying (9). As explained 
in the last section, an execution of the Gaussian algorithm is always expressed as 
a (finite) word on the alphabet A = (S, T, T“^}. As a direct consequence of the 
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previous theorem, a word on this alphabet is associated to a possible execution 
if and only if the word is a normal form of the previous rewrite system. 

The trivial rules (13) and (14) have to be applied whenever possible. So any 
word oji on the alphabet S can trivially be written as 

1 

J'Xk + l n ST^\ (19) 

i—k 

with Xi G Z* ior 2 < i < k and {xi,Xk+i) G Z^. The integer k is called the length!^ 
ofuji- Notice that usually the length of a word is the number of its letters, which 
would be here equal to 2k + I- Here the length is k, which corresponds to the 
number of iterations of the algorithm (eventually minus 1). 

The proof of Theorem 1 is based on the next Lemmata. 

Lemma 1. Let u>i he a word as in (19). Then the rewriting process of Theorem 1 
always terminates^ . 

Proof (Sketch of the proof). Let k he a, nonnegative integer, and let xi, ..., 
Xk+i be integers such that X 2 ,. . . , Xk are nonzero. Let u>i be a word on {S', T} 
expressed by uj\ = T“'=+i ST^G We consider the index sets Si •.= {v.2 < i < 
k and \xi\ = 1} and S 2 \= {i\2 <i < k, XiXi-i < 0 and \xi\ = 2}. Finally for a 
word oji, the quantity d{coi) is X)ieSiUS 2 Lemma is shown"^ by induction 

on the length of tui, and on the integer quantity(i(wi). 

The next lemma is crucial in the proof of Theorem 1. Indeed, as a direct 
Corollary of the next Lemma, normal forms of the rewrite system proposed in 
Theorem 1 are possible traces of the Gaussian algorithm^. Let us observe that 
the proof of Lemma 2 is closely related to the mechanism of the algorithm. Even 
slightly modifying the Gaussian algorithm may make the lemma fail. 

Lemma 2. Let B be the matrix of a proper basis (bi,b 2 ) (see (3), ()) and (5). 
Let X G Z* be a non zero integer and B defined by B = (bi,b 2 ) := ST^B. 

1- If \x\ > 3, then B is still proper. Moreover, 

— if (bi,b 2 ) and x are both positive or both negative, then B is proper 
whenever \x\ > 2. 

— if B is reduced, |bi| < |b 2 | and m yf —1/2, then B is proper for all 
X G Z. 

2. If \x\ > 2, then |b 2 | < |bi|. Moreover, z/(bi,b 2 ) and x are both positive or 
both negative, it is true provided that |a:| > 1. 

The length of a word which is a decomposition of a unimodnlar matrix has of course 
to be distingnished from what we call the length of a unimodular matrix, that is 
closely related to the number of bits to store the matrix. 

® Of course saying that the rewriting process presented by the previous Theorem 
always terminates has a priori nothing to do with the well-known fact that the 
Gaussian algorithm always terminates. 

^ A detailed proof is available in the full version of the paper. 
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3- If |a^| > 2, then max(|bi|, |b^ 2 |) > max(|bi|, |b 2 |). 

4- If |a^l ^ 1; then (bi,b 2 ) and x are both positive or both negative. 

Let us explain how the previous lemma shows that a normal form is a possible 
trace of the algorithm. First consider a proper (see (5) and non reduced basis 
P. Given to the Gaussian algorithm, the first operation done will be the swap. 
Now for any non proper basis B there is a unique integer x and a unique proper 
basis P such that B is expressed as T^P. So if B is given to the algorithm the 
first operation done will be T~^. Now consider for instance B' = ST^P, where 
P is a proper basis. The previous lemma asserts that B' is also proper and non 
reduced. So if B' is given to the algorithm, the first operation is the swap. More 
generally if 



jj ._ j'Xk+i gj'Xk gj'Xk-i ^ ^ . 



is a normal form of the rewrite system and R a reduced basis satisfying (9), then 
thanks to the previous lemma all the intermediate bases ST^^R, ST^^R, 
. . ., ST^^R are proper. So the algorithm will perform the 

exact following sequence of operations: 



j'-xk+i ST~^'‘ . . . ST~^^ ST~^^ . 



Corollary 1. Any normal form of the rewrite system defined in Theorem 1 is 
Gaussian. 

Proof (Sketch of the proof of Theorem 1). Gonsider an input B = UR where U 
is an unimodular matrix and i? is a reduced basis of a lattice L satisfying (9). 

Lemma 1 asserts that for any decomposition w of C/ in terms of S and T, 
there is a normal form oj' (and a unimodular matrix U' = to'). Notice that the 
Lemma does not show the uniqueness of the normal form. 

Gorollary 1 of Lemma 2 asserts that normal forms of the rewrite system are 
Gaussian words (traces of the Gaussian algorithm). 

Now observe that the use of a nontrivial rewriting rule changes a base of the 
lattice into another base of the same lattice and the way the basis is changed is 
totally explicit. So for an input UR there is a couple {U',R') such that 

(i) UR = U'R', 

(ii) the matrix U' is unimodular and its decomposition is a normal form of 
the rewrite system, 

{Hi) and R' is also a reduced basis of the same lattice L. 

Finally by recalling that the Gaussian algorithm is deterministic, the decompo- 
sition u)' of U' is the trace of the Gaussian algorithm when running on B = U R. 
(Of course the output is R' .) 

Proofs of Theorems 2, 3 and 4, which are presented in the following subsections, 
are very similar. 
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3.2 The Basis R Is Such That |bi| = |b 2 | and m ^ —1/2 

Theorem 2. Let lo\ be any decomposition ofU in terms of the family of gener- 
ators {S', T}. The Gaussian decomposition ofU is obtained from oji by applying 
repeatedly the set of rules (13) to (18) of Theorem 1, together with the following 
rules: 

uS — ^ w; (20) 

uST — >uTST-\ (21) 



3.3 The Basis R Is Such That |bi| < |b 2 | and m — —1/2 



Theorem 3. Let R be a reduced basis and let U be a unimodular matrix, i.e., 
an element of GL 2 (Z) . Let be any decomposition of U in terms of the family 
of generators (S, T}. The Gaussian decomposition of U is obtained from oJi by 
applying repeatedly the rules (13) to (16) of Theorem 1 together with the rules 
(22) and (23) defined here, until no one of these rules applies. Then if we have 
oji = oj ST'^ S, the ending rule (2)) applies once and the rewriting process is 
over. 



xez*,kez+, stst^ 

x&IT,k&'L+ST-^ST^ 




TST-^-^ T; 

rj,-igT~x+i ST~y^ T- 



ujST^ S ^ ujT ST-"^ ST. 



(22) 

(23) 

(24) 



3.4 The Basis R Is Such That |bi| = |b 2 | and m — —1/2 

Theorem 4. Let R be a reduced basis and let U be a unimodular matrix, i.e., 
an element of GL 2 (Z) . Let wi be any decomposition of U in terms of the family 
of generators {S, T}. The Gaussian decomposition of U is obtained from oJi by 
applying repeatedly Rules (13) to (16) of Theorem 1, together with Rules (22), 
(23) and (20) and the following set of rules: 



ujST — ^ wT; (25) 

LUST‘D — >loTST-\ (26) 

4 The Length of a Unimodular Matrix with Respect 
to Its Gaussian Decomposition and the Maximum 
Number of Steps of the Algorithm 

Let B = (bi,b 2 ) be a basis. The length of B, denoted by £{B), is the sum of 
the squares of the norms of its vectors, that is, £{B) = |bip + |b 2 p. 

The easy but tedious proof of the following theorem is given^ in the full 
version of the paper. 
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Theorem 5. Let R = (bi,b 2 ) he a reduced basis, let k he a positive integer, 
and let xi, . . . , Xk+i be integers such that the word oj = nl=fc is Gaussian. 
Then the following properties hold: 

1. z/|bi|<|b 2 | and m > 0 then £{u;R) > S R); 

2. if jbij < jb 2 | and — 1/2 < m < 0 then t{ujR) > S R); 

3. if jbij < jb 2 | and m = —1/2 then i^toR) > ST R); 

4. if |bii = ib 2 | then £{toR) > f((ST-2)'=-i ST~^ R). 

The previous theorem provides bases whose length are minimal among all 
bases requiring at least k iterations to the Gaussian algorithm. These are essen- 
tially {ST^)'^~^ S R where i? is a reduced basis. We have then to lower bound 
the length SR). In the next section, we just recall how to evaluate 

such a length. 

The next lemma is exactly Lemma 4 of [16]. A sketch of the proof is recalled 
in the full version of the paper. 

Lemma 3. Let k > 2 be a fixed integer. There exists an absolute constant A such 
that any input basis demanding more than k steps to the Gaussian algorithm has 
a length greater than A{1 + -\/2)^^~^- 

It follows that any input with length less than A{1 + is demanding 

less than k steps. We deduce the following corollary. 

Corollary 2. There is an absolute constant A such that the number of steps of 
the Gaussian algorithm on inputs of length less than M is bounded from above 
by 

I (l«S<. + ,/2, (^) + l) ■ 

5 Sorting Algorithms 

In the previous sections, we proposed a method for worst-case analyzing the 
Gaussian algorithm. We hope to generalize the approach to the LLL algorithm 
in higher dimensions (a still open problem even in three dimensions). On the 
other hand, our method can be applied to other families of algorithms. In this 
Section we consider some the bubble sort algorithm (in the full vesion of the 
paper we consider also the insertion sort and the selection sort algorithms). Of 
course worst-cases of these sorting algorithms are very well-known. Here the aim 
is to use our method to recover these well-known worst cases. 

A sorting algorithm (as the Gaussian algorithm) uses some atomic trans- 
forms. Once more the monoid of finite sequences of atomic transforms is a group: 
The permutation group plays here the role played by GL 2 {Z) in Section 3. 

For each considered sorting algorithm we propose a set of rewriting rules 
over the group of permutations represented by a family of generators that is 
precisely the set of atomic transforms of the algorithm. Glearly an execution of 
the algorithm is a finite word on the alphabet of these atomic transforms. But 
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any such word is not necessarily an execution of the algorithm. We prove that 
a word on the alphabet of atomic transforms is associated to an execution of 
the algorithm if and only if the word is a normal form of the rewrite system we 
propose. 

In a second step, as for the analysis of the Gaussian algorithm, we have to 
consider the variation of the length of the input with respect to the length of 
the reduced word issued by the trace of the algorithm running on that input. So 
in our method we have to deal with two notions of length: the length of normal 
forms, that counters the number of iterations of the algorithm and the length of 
the inputs that is classically associated to the number of bits to store the input. 

Let us observe that here this step is somehow trivial. Indeed when running on 
n items to be sorted, the length of the input of a sorting algorithm is always n (at 
least in usual analyzes), no matter how many steps are needed to sort the n input 
items. In other words, the length of the input is in the case of sorting algorithm 
constant and does not depend on the length of the normal form associated to 
this input, as it did when considering the Gaussian algorithm. 

So clearly the longest length of the normal forms is here exactly the maximal 
number of iterations of the sorting running on inputs of length n. 

The sketch of the proof is the same than the one of the Gaussian algorithm: 
We first prove that the rewriting process always terminates. Then, we show 
that the reduced words are also the normal forms given by the corresponding 
algorithm. 

In the sequel, we first recall some useful definitions and notations. Then we 
analyze the bubble sort algorithm. The rewrite systems for the insertion and 
selection sort algorithms, which are close to the rewrite system associated to the 
bubble sort are also given in the full vesion of this paper. 

Let n be a positive integer, and let [1, . . . , n] be the sorted list of the n first 
positive integers. Let be the set of all permutations on [1, . . . ,n], and let S 
be the set of all permutations on a list of distinct integers of variable size. Let us 
denote by ti the transposition which swaps the elements in positions i and i + 1 
in the list , for all i G {1, . . . ,n}. Any permutation can be written in terms of 
the ti-s. Let Sn be defined hy Sn = {tii - ■ ■ ^ tn} and S denote S = {ti: t G N*}. 
Thus Sn (resp. U) is a generating set of (resp. S). 

As in previous sections, any word lo on S will be denoted as following: 

k 

UJ ti^ ti^ . . . tif, tij , 

i=i 



where k and . . . , ik are positive integers. 

Definition 1. Let uji = tqtij . . .ti^, u )2 = tj-^tj^ ■ ■ ■ tj^ and W 3 = tr-itr^ ■ • ■ be 
words on E. 

1. The length of uj, denoted by \lj\, is k; 

2. the distance between wi and C 02 , denoted by Dist{uji,uj 2 ), is given by 

I* - j|; 
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3. the maximum (resp. minimum^ ofuji, denoted max(o'i) (resp. min(a;i)^, 
is given by max(.g(^^(i) (resp. mint,e^^{i)); 

4- cvi is an increasing (resp. decreasing^ word if ip < ip+\ (resp. ip > ip+i), 
for all p € {1, . . . , A: — 1}; 

5. UJ 2 is a maximally increasing factor of W 1 W 2 W 3 if u >2 is increasing and both 
ti^t 02 and uj 2 tri are not increasing. 

Any word w on if is uniquely expressed on the form 



LO — UJ 1 UJ 2 . ■ .tOm, (27) 

where each Wj is a maximally increasing factor of u. We will call (27) the in- 
creasing decomposition of uj. We define s: if* — >■ N as the map given by the rule 
s{uj) = m. 

The basic idea of the bubble sort algorithm is the following: pairs of adjacent 
values in the list to be sorted are compared and interchanged if they are out 
of order, the process starting from the beginning of the list. Thus, list entries 
‘bubble upward’ in the list until they bump into one with a higher sort value. 
The algorithm first compares the two first elements of the list and swaps them if 
they are in the wrong order. Then, the algorithm compares the second and the 
third elements of the list and swaps them if necessary. The algorithms continues 
to compare adjacent elements from the beginning to the end of the list. This 
whole process is iterated until no changes are done. 

Let CT be a permutation on [1, . . . , n]. There exists a unique decomposition to 
of a on the alphabet S corresponding to the sequence of elementary transforms 
performed by the bubble sort algorithm on ct[ 1, . . . , n] . We will call it the bubblian 
decomposition of a. Notice that {uS)~^a = 1. The bubble sort algorithm can be 
regarded as an algorithm giving the bubblian decomposition of a permutation. 

Definition 2. A word u> on S is a bubblian word if it corresponds to a possible 
execution of the bubble sort algorithm. 

Let us define some rewriting rules on S* . In the following equations, i, j and 
k are arbitrary positive integers and w is a word on S: 

UU^l; (28) 

if Dist(tj-|_i , U^) ^ 1 , ti -^-1 iO ti ti^i^ y Ujti ti-^-l ti^ (29) 

Dist(ti,o') > 1 and co maximally increasing factor, u)ti — > tico; (30) 
Dist(tj, tfew) >1, i<j<k or k<i<j, titkiotj — > titjtkco. (31) 

Theorem 6. Let a be a permutation and let to € S* be a decomposition of a on 
S. The bubblian decomposition of a is obtained from uj by applying repeatedly 
the rules (28) to (31). 

Remark 1. Let oj and oj' be words on S. It is well known that a presentation of 
5 on is the following: 
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i’iti 1; 

ti ^i+1 — ^z+1 ii ii+1: 

for all positive integers i, j such that \i — j\ = 1. Thus, it is easy to prove that 
if Lo' is obtained from uj, then w = oj' in S. 

The proofs of Theorem 6 is very similar to the proof of Theorem 1. It is 
based on the following lemmata. 

The next Lemma shows first that the rewrite process terminates: Given a 
permutation and any decomposition of this permutation in terms of tiS , one 
obtains a normal form of the rewrite system represented by rules (28), (29), (30) 
and (31) by finitely many times applying the rules (in an arbitrary order). For 
any word u> = ta^ . . . € S*, we define the two quantities l{co) and h{co) by 

|(j| s(w) 

l{uj) = ''^^ai and h{uj) = y^(s(u;) — i)(max(u;) — |a;i|), 

i=l 

where ujiuj 2 ■ ■ -tOm is the increasing decomposition of to. The proof of the next 
lemma is a double induction on the positive integer quantities l{uj) and h{iv). 

Lemma 4. The rewriting process defined by rules (28), (29), (30) and (31) 
always terminates. 

The next Lemma shows that any normal form of the rewrite system is a 
bubblian word (a possible execution of the bubble sort algorithm). The proof^ 
of this lemma is of course related to the bubble sort algorithm. 

Lemma 5. Let u> he a reduced word. Then to is a bubblian word. 

Since the bubblian word associated to a given permutation is unique and the 
rewriting process terminates, the bubblian words are exactly normal forms of the 
rewrite system. Notice that we can easily deduce from Theorem 6 the worst-case 
for the bubble sort algorithm. 

6 Conclusion 

In this paper we studied the Gaussian algorithm by considering a rewriting 
system over GL 2 CZ). We first believe that our method should be applied to 
other variants of the Gaussian algorithm (for example, Gaussian algorithm with 
other norms [5]): For each variant there is an adequate rewriting system over 
GL2{1). 

The most important and interesting continuation to this work is to generalize 
the approach in higher dimensions. Even in three dimensions, the worst-case con- 
figuration of all possible generalization of the Gaussian algorithm is completely 
unknown for the moment. Although the problem is really difficult, we have al- 
ready achieved a step, since the LLL algorithm uses the Gaussian algorithm as 
an elementary transform. 
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The group of n-dimensional lattice transformations has been studied first by 
Nielsen [14] (n = 3) and for an arbitrary n by Magnus [11,12], based on the work 
of Nielsen[15]. Their work should certainly help to exhibit such rewrite systems 
on GL„(Z) if there exists. 

This approach may also be an insight to the still open problem of the com- 
plexity of the optimal LLL algorithm [2,9]. 

Acknowledgments. The authors are indebted to Brigitte Vallee for drawing 
their attention to algorithmic problems in lattice theory and for regular helpful 
discussions. 
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Abstract. Given a finite set V, and integers fc > 1 and r > 0, denote 
by A(k, r) the class of hypergraphs A C 2^^ with (k, r)-bounded intersec- 
tions, i.e. in which the intersection of any k distinct hyperedges has size 
at most r. We consider the problem MIS{A,T): given a hypergraph A 
and a subfamily X C X(A), of its maximal independent sets (MIS) X{A), 
either extend this subfamily by constructing a new MIS I £ X{A) \ X 
or prove that there are no more MIS, that is X = X[A). We show that 
for hypergraphs A £ A{k,r) with fc -|- r < const, problem MIS(A,iT) 
is NC-reducible to problem MIS(A^0) of generating a single MIS for 
a partial subhypergraph A' of A. In particular, for this class of hyper- 
graphs, we get an incremental polynomial algorithm for generating all 
MIS. Furthermore, combining this result with the currently known al- 
gorithms for finding a single maximal independent set of a hypergraph, 
we obtain efficient parallel algorithms for incrementally generating all 
MIS for hypergraphs in the classes A(l,c), A(c, 0), and A(2, 1), where c 
is a constant. We also show that, for A £ A{k,r), where k + r < const, 
the problem of generating all MIS of A can be solved in incremental 
polynomial-time with space polynomial only in the size of A. 



1 Introduction 

Let Al C 2^ be a hypergraph (set family) on a finite vertex set V. A vertex 
set / C U is called independent if / contains no hyperedge of A. Let X{A) C 
2'^ denote the family of all maximal independent sets (MIS) of A. We assume 
that A is given by the list of its hyperedges and consider problem MIS{A) of 
incrementally generating all sets in X{A). Clearly, this problem can be solved by 
performing |2i(Al)| -I- 1 calls to the following problem: 

MIS{A,X)\ Given a hypergraph A and a collection X C X{A) of its maximal 
independent sets, either find a new maximal independent set I £ X{Af) \ X, or 
prove that the given collection is complete, X = X{A). 
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0118635. The research of the first and third authors was also supported in part by the 
Office of Naval Research, grant N00014-92-J-1375. The second and third authors are 
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Center for Discrete Mathematics and Theoretical Computer Science. 
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Note that if / G '^{A) is an independent set, the complement B = V \ I is a, 
transversal to A, that is B(lA yf 0 for all A G A, and vice versa. Hence {B \ B = 

V \ I, I G 1-{A)} = .4'^, where A’^ {B | H is a minimal transversal to A} 
is the transversal or dual hypergraph of A. For this reason, MIS(.A,I) can be 
equivalently stated as the hypergraph dualization problem: 

DUAL{A,B): Given a hypergraph A and a collection B C A'^ of minimal 
transversals to A, either find a new minimal transversal B G A\B or show 
that B = A. 

This problem has applications in combinatorics, graph theory, artificial intel- 
ligence, reliability theory, database theory, integer programming, and learning 
theory (see, e.g. [5,9]). It is an open question whether problem DUAL(^, ,B), 
or equivalently MIS (A, I), can be solved in polynomial time for arbitrary hy- 
pergraphs. The fastest currently known algorithm [11] for DUAL(M, yB) is quasi- 
polynomial and runs in time where n = \V\ and m = |A|-f |B|. 

It was shown in [6,9] that in the case of hypergraphs of bounded dimension, 

def 

dim(A) = max^g^ |A| < const, problem MIS(A,I) can be solved in polynomial 
time. Moreover, [4] shows that the problem can be efficiently solved in parallel, 
MIS(M,I) G NC for dim(M) < 3 and MIS(M,I) G RNC for dim(M) = 4,5... 
Let us also mention that for graphs, dim(M) < 2, all MIS can be generated with 
polynomial delay, see [13] and [19]. 

In [8], a total polynomial time generation algorithm was obtained for the 
hypergraphs of bounded degree, deg(M) max„gy \{A : v G A G M}[ < const. 
This result was recently strengthened in [10], where a polynomial delay algorithm 
was obtained for a wider class of hypergraphs. 

In this paper we consider the class A(/e, r) of hypergraphs with (fc, r)-bounded 
intersections: A G A(k,r) if the intersection of each (at least) k distinct hyper- 
edges of A is of cardinality at most r. We will always assume that fc > 1 and 
r > 0 are fixed integers whose sum is bounded, fc -I- r < c = const. Note that 

dim(M) < r iff MGA(l,r) and deg(M) < fc iff MgA(/c,0), 

and hence, the class A(/c, r) contains both the bounded-dimension and bounded- 
degree hypergraphs as subclasses. It will be shown that problem MIS (A, I) can 
be solved in polynomial time for hypergraphs with (/c, r)-bounded intersections. 
It is not difficult to see that for any hypergraph A G A(fc,r) the following 
property holds for every vertex-set X GV: X is contained in a hyperedge of A 
whenever each subset of X of cardinality at most c = fc -|- r is contained in a 
hyperedge of A. [Indeed, suppose that A is a minimal subset of V not contained 
in any hyperedge of A, and that every subset of X of cardinality at most k + r 
is contained in a hyperedge of A. Note that |A| > fc -|- r -I- 1. Let ei, . . . ,Ck 
be distinct elements of X. Then the exist distinct hyperedges Ai,. . . , G A 
such that X \ {a} C Ai, for i = 1, . . . ,k. Now we get a contradiction to the 
fact that A G A{k,r) since |Ai fl . . . fl Ak\ > r -I- 1.] Hypergraphs A C 2^ 
with this property were introduced by Berge [3] under the name of c-conformal 
hypergraphs, and clearly define a wider class of hypergraphs than A(fc,r) with 




490 



E. Boros et al. 



fc + r = c. In fact, we will prove our result for this wider class of c- conformal 
hypergraphs. 

Theorem 1. For the c-conformal hypergraphs, c < const, and in particular for 
A G A{k,r), k + r < c = const, problem MIS{A,T) is polynomial and hence 
I{A), the set all MIS of A, can he generated in incremental polynomial time. 

Theorem 1 is a corollary of the following stronger theorem which will be 
proved in Section 2. 

Theorem 2. For any c-conformal hypergraph A, where c is a constant, problem 
MIS{A,T) is NC-reducible to MIS{A' ,$), where A' is a partial sub-hypergraph 
of A. 

In Section 2, we also derive some further consequences of Theorem 2, related 
to the parallel complexity of problem MIS(^,I) for certain classes of hyper- 
graphs. 

Let us note that our algorithm of generating I (A) based on Theorem 1 is 
incremental, since it requires solving problem MIS(^,I) iteratively |L(-4)| -I- 1 
times. Thus, this algorithm may require space exponential in the size of the 
input hypergraph N = N{A) = I^I- ^ generation algorithm for X{A) is 

said to work in polynomial space if the total space required by the algorithm to 
output all the elements of X{A) is polynomial in N . In Section 3, we prove the 
following. 

Theorem 3. For the hypergraphs of bounded intersections, A G A{k,r), where 
k-\- r < const, all MIS of A can he enumerated in incremental polynomial time 
and with polynomial space. 

Finally, we conclude in Section 4, with a third algorithm for generating all 
maximal independent sets of a hypergraph A G A(fc, r), k -\- r < const. 

2 NC-Reduction for c-Conformal Hypergraphs 

The results of [4] show that, for hypergraphs of bounded dimension ^(l,c), 
there is an NC-reduction from MIS(^,I) to MIS(^',0), where A' is a partial 
sub-hypergraph of A. In other words, the problem of extending in parallel a 
given list of MIS of A can be reduced to the problem of generating in parallel 
a single MIS for a partial sub-hypergraph of A. In this section we extend this 
reduction to the class of c-conformal hypergraphs, when c is a constant. 



2.1 c- Conformal Hypergraphs 

Given a hypergraph A C 2^, we say that A is Sperner if no hyperedge of 
A contains another hyperedge. By definition, for every hypergraph A, its MIS 
hypergraph T{A) is Sperner. Let us inverse the operator X. Given a Sperner 
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hypergraph B C 2^ , introduce the hypergraph A = C 2^ whose hyper- 

edges are all minimal subsets A Q V which are not contained in any hyperedge 
of B, that is A C B for no A G A, B G B and A' C B for some B G B for 
each proper subset A' d A G A. The hypergraph A = X~^(B) is Sperner by 
definition, too. It is also easy to see that B is the MIS hypergraph of A. In other 
words, for Sperner hypergraphs B = X{A) if and only if A = X~^{B). In [3], 
Berge introduced the class of c-conformal hypergraphs and characterized them 
in several equivalent ways as follows. 

Proposition 1 ([3]). For each hypergraph A C 2^ the following statements 
are equivalent: (i) A is c-conformal; (ii) The transposed hypergraph (whose 
incidence matrix is the transposed incidence matrix of A) satisfies the (c — 1)- 
dimensional Belly property: a subset of hyperedges from A'^ has a common vertex 
whenever every at most c hyperedges of this subset have one; (Hi) For each partial 
hypergraph A' Q A having c -I- 1 edges, the set {x GV \ dj\> {x) > c} of vertices 
of degree at least c in A', is contained in an edge of A. 

It is not difficult to see that we can add to the above list the following 
equivalent characterization: 

(iv) dim(I“^(M)) < c. 

Note also that (Hi) gives a polynomial-time membership test for c-conformal 
hypergraphs, for a fixed constant c. Thus even though, given a hypergraph A, 
the precise computation of dim(I”^(M)) is an NP-complete problem (it can be 
reduced from stability number for graphs), verifying condition (iv) is polynomial 
for every fixed c by Proposition 1 . 

Given a hypergraph A C 2^, let us introduce the complementary hypergraph 
A'^ = {V\ A I A G A} whose hyperedges are complementary to the hyperedges 
of A. It is easy to see that A‘^'^ = A, A^^ = A for each Sperner hypergraph A. 
In other words, both operations, duality and complementation, are involutions. 
It is also clear that A^’'^ = X(A) and A^'^ = X^^(A). 

A vertex set S is called a sub-transversal of A if S' C i? for some minimal 
transversal B G A‘^. Our proof of Theorem 2 makes use of a characterization of 
sub-transversals suggested in [6]. 

2.2 Characterization of Sub-transversals to a Hypergraph 

Given a hypergraph A C 2^, a subset S C V, and a vertex v G S, let Mi,(S) = 
{A G A \ AnS = {?;}} denote the family of all hyperedges of A whose intersection 
with S is exactly v. Let further Ao(S) = {AGA|AnS = 0} denote the partial 
hypergraph consisting of the hyperedges of A disjoint from S. A selection of [S'! 
hyperedges {Ay G Ay (S) \ v G S} is called covering if there exists a hyperedge 
A G Ao(S') such that A C Ay. 

Proposition 2 (cf. [6]). Let S QV be a non-empty vertex set in a hypergraph 
A G 2^. 

i) If S is a sub-transversal for A then there exists a non- covering selection {AyG 
A(5) \vGS} for S. 
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ii) Given a non-covering selection {Ay G | G S'} for S, we can extend S 

to a minimal transversal of A by solving problem MIS{A' , 0) for the induced 
partial hypergraph A' = {AC\U \ A G ^o('S')} C 2^, where U = 

Unfortunately, finding a non-covering selection for S (or equivalently, testing 
if S is a sub-transversal) is NP-hard if the cardinality of S is not bounded 
(see [4]). However, if the size of S is bounded by a constant then there are only 
polynomially many selections {Ay G -4„(S) | f G Sj for S. All of these selections, 
including the non-covering ones, can be easily enumerated in polynomial time 
(moreover, it can be done in parallel). 

Corollary 1. For any fixed c there is an NC algorithm which, given a hypergraph 
A C 2^ and a set S of at most c vertices, determines whether S is a sub- 
transversal to A and if so finds a non-covering selection {Ay G A^(S) | v G Sj. 

Note that this Corollary holds for hypergraphs of arbitrary dimension. 

2.3 Proof of Theorem 2 

We prove the theorem for the equivalent problem DUAL(A, yB). We may as- 
sume without loss of generality that A is Sperner. Our reduction consists of the 
following steps: 

Step 1. By definition, each set H G B is a minimal transversal to A. This 
implies that each set A G A is transversal to B. Check whether each A G A 
is a minimal transversal to B. If not, a new element in A‘^ \ B can be found 
be calling problem MIS(A',0), for some induced partial hypergraph A' of A. 
We may assume therefore that each set in A is a minimal transversal to B, i.e. 
A C B^^. Recall that A'^'^ = A for each Sperner hypergraph A. Therefore, if 
B yf A‘^ then A ^ B‘^, and thus B‘^ \ A yf 0. Hence we arrive at the following 
duality criterion: A‘^\B iff there is a sub-transversal S' to B such that 

S C A for no Ag A. (1) 

Hence we can apply the sub-transversal test only to S such that 

|S| < dim(I-i(A)). (2) 

Step 2 (Duality test.) For each set S satisfying (1), (2) and the condition that 

A g S for all A G A, (3) 

check whether or not 

S is a sub-transversal to B. (4) 

We need the assumption that dim(I“^(A)) is bounded to guarantee that this 
step is polynomial (and moreover, is in NC). Recall that by Proposition 2, S 
satisfies (4) iff there is a selection 



{By G By{S) h G Bj 



(5) 
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which covers no set B G Bo{S). Here as before, yBo(S') = {BGB\Br\S = ^} 
and By{S) = {BeB\B(lS= {?;}} for v e S. 

If conditions (l)-(4) cannot be met, we conclude that B = A‘^ and halt. 

Step 3. Suppose we have found a non-covering selection (5) for some set S sat- 



isfying (l)-(4). Then it is easy to see that the set Z 



s U 






is 



independent in A. Furthermore, Z is transversal to B, because selection (5) is 
non-covering. Let A' = {AC\U \ A & A}, where U = V \ Z, and let T be a 
minimal transversal to A'. (As before, we can let T = 17 \ output {MIS {A' 
Since Z is an independent set of A, we have T fl A yf 0 for all A G A, that is T 
is transversal to A. Clearly, T is minimal, that is T G A'^. It remains to argue 
that T is a new minimal transversal to A, that is T ^ B. This follows from the 
fact that Z is transversal to B and disjoint from T. □ 

Note that Theorem 2 does not imply that MIS (A, I) £ NC because the par- 
allel complexity of the resulting problem MIS(A', 0) is not known. The question 
whether it is in NC in general (for arbitrary hypergraphs) was raised in [14]. 
The affirmative answers were obtained in [1,7,15,17] for the following special 
cases: For hypergraphs of bounded dimension, A G A(l,c), it is known that 
MIS(A', 0) G NC for c < 3, and MIS(A', 0) G RNC for c = 4, 5, . . . , see [2,15]. 
Furthermore, it was shown in [17,18] that MIS(A',0) G NC for the so-called 
linear hyperedges, in which each two hyperedges intersect in at most one vertex, 
that is for A' G A(2,l). Finally, it follows from [12] that MIS(A',0) G NC for 
hypergraphs of bounded degree, that is for A' G A(c, 0). Combining the above 
results with Theorem 2, we obtain the following corollary. 



Corollary 2. Problem MIS{A,I) is in RNC for A G A(l,c), where c is a con- 
stant (hypergraphs of bounded dimension). Furthermore, MIS{A,T) is in NC for 
A G A(l,c), c < 3 (hypergraphs o/dim <3^, for A G A(c, 0), where c is a con- 
stant (hypergraphs of bounded degree), and for A G A(2, 1) (linear hypergraphs). 

Yet, for a hypergraph A satisfying dim(I~^(A)) < const, or even more specif- 
ically for A G A(fc,r), k-\-r < const, we only have an NC-reduction of MIS(A,I) 
to MIS(A', 0), where the parallel complexity of the latter problem is not known. 



3 Polynomial Space Algorithm for Generating A.^ 

For i = 1, . . . ,n denote by [i : n] the set {i,i -\- 1, . . . , n}, where [n -I- 1 : n] is 
assumed to be the empty set. Given a hypergraph A C 2^], we shall say that 
X C [n] is an f-minimal transversal for A if A A [i : n], A is a transversal of A, 
and A \ {)} is not a transversal for all j G A fl [1 : i — 1]. Thus, n -\- 1-minimal 
transversals are just the minimal transversals of A. For i = 1, .. . , n, let A‘^' be 
the family of Aminimal transversals for A. 

Given i G [n] and A G A‘*S let Ai(A) be the hypergraph 



A(A) = {A\{i} : AgA, AnA = {i}}. 
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Proposition 3 (see [10,16]). 

(i) \ I < for i = 1, . . . , n + 1. 

(ii) \Ai{X)‘^\ < fori G [n] and X G . 

Now consider the following generalization of an algorithm in [19] for gener- 
ating maximal independent sets in graphs (see also [13] and [16]). Given i G [n], 
and X G A‘^' , we assume in the algorithm that the minimal transversals Ai{X)‘^ 
are computed by calling a process P{i,X) that invokes the same algorithm 
recursively on the partial hypergraph Ai{X). We further assume that, once 
P{i,X) finds an element Y G Ai{XY, it returns control to the calling process 
GEN(^, i,X). When called for the next time, P{i,X) returns the next element 
of Ai{XY that has not been generated yet, if such an element exists. 

Algorithm GEN(A, i,X): 

Input: A hypergraph A, an index i G [n], and an i-minimal transversal X G 
Output: All minimal transversals of A. 

1. if i = n -I- 1 then 

2. output X\ 

3. else 

4. if A \ {i} is a transversal of A then 

5. GEN{A,i + l,X\{i})- 

6. else 

7. GEN(A,i-f 1,A); 

8. for each minimal transversal Y G Ai(X)‘^ (found recursively) do 

9. if A U y \ {j} G then 

10. Compute the lexica, largest set Z O X UY such that Z G 

11. if ^ = A then 

12. GEN(A,i-f 1, Auy \ {i}); 



Lemma 1. When called with i = 1 and X = [n], Algorithm GEN{A,i, X) out- 
puts all minimal transversals of A with no repetitions. 

Proof. Gonsider the recursion tree T traversed by the algorithm. Label each 
node of tree by the pair (i,X) which represents the input to the algorithm at 
this node. Glearly i represents the level of node (i. A) in the tree (where the root 
of T is at level 1). By induction on f = 1, . . . , n -I- 1, we can verify the following 
statement: 



= {A C [n] : {i,X) G T}. (6) 

Indeed, this trivially holds at i = 1. Assume now that (6) holds for a specific 
i £ [n — 1]. It is easy to see that any node {i 1,X) G T generated at level 
f -I- 1 of the tree must have X G . Thus it remains to verify that C 

{A : (i -h 1, A) G T}. To see this, let A' be an arbitrary element of A‘^*+G 
Note first that if A' 9 i then A' \ {i} is not a transversal of A and A' G A‘^% 
and therefore by induction we have a node {i,X') G T. Gonsequently, we get a 
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node (z + 1, X') G T as a child of (z, X') G T, by Step 7 of the algorithm. Let us 
therefore assume that X' ^ i. Note that X' must contain a subset X \ {z}, for 
some X G ■ This is because X' U {z} is a transversal and therefore it contains 
an z-minimal transversal X of A. Among all the sets X satisfying this property, 
let Z be the lexicographically largest. Now, if Z \ {z} is a transversal of A, then 
^ \ Step 5 will create a node (z + 1, X') G T as the only child of 

(z, Z) G T. On the other hand, if Z \ {z} is not a transversal, then X' can be 
written as A' = Z U Z \ {z}, for some Y G Ai(Z)'^. But then node (z + 1, A') 
will be generated as a child of (z, Z) G T by Step 12 of the Algorithm. This 
completes the proof of (6). Finally, it follows from Step 10 that each node in 
the tree is generated as the child of exactly one other node. Consequently each 
leaf is visited, and hence each set A G A‘^ is output, only once and the lemma 
follows. □ 

The next lemma states that, for hypergraphs A of (fc, r)-bounded intersec- 
tions, Algorithm GEN is a polynomial-space, output-polynomial time algorithm 
for generating all minimal transversals of A. 

Lemma 2. The time taken by Algorithm GEN until it outputs the last minimal 
transversal of a hypergraph A G A{k,r) is 0(zz^+’'“^ and the total space 

required is 0(A’’+^). 

Proof For a hypergraph A G A{k,r), let T{A) and M{A) be respectively the 
time and space required by Algorithm GEN to output the last minimal transver- 
sal of A. Note that the algorithm basically performs depth-first search on the 
tree T (whose leaves are the elements of and only generates nodes of T 
as needed during the search. Since each node of the tree T, which is not a leaf, 
has at least one child, the time between two successive outputs generated by the 
algorithm does not exceed the time required to generate the children of nodes 
along a complete path of the tree T from the root to a leaf. But, as can be seen 
from the algorithm, for a given node v = (z, A) in T, where z G [rz] and A G A‘^f 
the time required to generate all the children of v, is bounded by the time to 
output all the elements of Ai(A)'^. Since the depth of the tree is rz -I- 1, we get 
the recurrence 

T{A) <n\A‘^\maK{T{A^{X)) : z G [zz], X € A‘^*}. (7) 

Note that Ai{X) G A{k,r — 1). Furthermore, by Proposition 3, we have 
\Ai{XY\ < and thus (7) gives T{A) < {n\A'^\YT{A'), for some sub- 

hypergraph A' G A(fc,0) of A which satisfies |(A')‘^| < \A'^\. Now, we observe 
that for any i G [n] and A G {A'Y\ we have |A'(A)| < fc — 1, and hence it 
follows that T{A') = 0{n^~^\{A'Y\). The bound on the running time follows. 

Now let us consider the total memory required by the algorithm. Since, for 
each recursion tree (corresponding to a (sub-)hypergraph that is to be dualized), 
the algorithm maintains only the path from the root to a leaf of the tree, we 
get the recurrence M{A) < Amax{M(Ai(A)) : z G [zz], A G This 

recurrence again gives M{A) < N’^M(A'), for some sub-hypergraph A' G A(k, 0) 
of A. But M(A') = 0{N) and the bound on the space follows. □ 
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Now Theorem 3 follows by combining Lemma 2 with the following reduction. 



Proposition 4. Let A C 2l"l he a hypergraph. Suppose that there is an algorithm 
P that generates all minimal transversals of A in time p{n, and space 

q{N{A)), for some polynomials p{-,-) and q{-). Then for any integer k, we can 
generate at least k minimal transversals of A in time 2n{p{n, k) + 1) and space 
q{N{A)). 

Note that it is implicit in the proof of Lemma 2 that, for both graphs A G 
A(l,2) and hypergraphs of bounded degree A G A(c, 0), Algorithm GEN is in 
fact a polynomial delay and polynomial space algorithm for generating A‘^. In 
particular, Theorem 3 implies the following previously known results [10,13,19]. 

Corollary 3. For graphs, A G A(l,2), and also for the hypergraphs of bounded 
degree, A G A(c, 0), all minimal transversals of A can be enumerated with poly- 
nomial delay and polynomial space. 

4 Generating Using the Supergraph Approach 

Let A C 2^ be a hypergraph. In this section, we sketch another algorithm to 
list all minimal transversals of A. The algorithm works by building a strongly 
connected directed supergrah Q = {A'^,S) on the set of minimal transversals, in 
which a pair of vertices {X,X') forms an edge in E if and only if X' can be 
obtained from X by deleting an element from X \X' , adding a minimal subset 
of elements from X'\X to obtain a transversal, and finally reducing the resulting 
set to a minimal feasible solution in a specified way (say in reverse-lexicographic 
order). In other words, {X,X') G £ if and only if X' C X U Z \ {e}, for some 
e G X \ X' and Z C X' \ X, such that Z is minimal with the property that 
X U Z \ {e} is a transversal. 

The strong connectivity of Q can be proved as follows. Given two vertices 
Xq, Xi G A‘^ of G, there exists a set {Xi , . . . , A/_i} of elements of P, where for 
alH = 1, . . . ,1, Xi is obtained from Aj_i by deleting an element Cj G Xi_i \ Xi 
(thus making Xi_i \ {ci} non-transversal), adding a minimal subset of elements 
Zi G Xi \ Xi-i to obtain a transversal Xi-\ \ {ci} U Zi, and finally, reducing 
the resulting set to a minimal transversal Xi C Ai_i U \ {ci}. Note that, for 
i = 1,. . . ,1, \Xi \ A;| < \Xi-i \ X[\ and therefore I < [Aq \ A;|. In other words, 
G has diameter at most n. 

The minimal transversals of A can thus be generated by performing breadth- 
first search on the vertices of G, starting from an arbitrary vertex. Such a pro- 
cedure can be executed in incremental polynomial time if the neighbourhood 
of every vertex in G can also be generated in (incremental) polynomial time. 
Given a hypergraph A G A{k,r), and a minimal transversal X G A'^, all neigh- 
bours of A in C/ can be generated in time Indeed, for any 

e G A, all minimal subsets of vertices Z, such that A \ {e} U Z is a transver- 
sal of A, can be obtained by finding all minimal transversals for the hyper- 
graph Ae{X) = {A \ {e} : A G A, A C\ X = {e}}. But as noted before. 
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Ae{X) G A{k,r — 1) and \Ae{XY\ < \A'^\. We conclude therefore, as in the 

proof of Lemma 2, that the time required to produce all the neighbours of X by 

applying the algorithm recursively on each of the hypergraphs Ae, for e G X, is 

0(n'=+'-|^^r+i). 
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Abstract. Given a set T of rooted, unordered trees, where each Ti gT 
is distinctly leaf-labeled by a set A{Ti) and where the sets A{Ti) may over- 
lap, the maximum agreement supertree problem (MASP) is to construct 
a distinctly leaf-labeled tree Q with leaf set A(Q) C A{Ti) such 

that |d(Q)| is maximized and for each Ti G T, the topological restriction 
of Ti to A{Q) is isomorphic to the topological restriction of Q to A{Ti). 
Let n = |[_Jy yl(Ti) I , k = \T\, and D = maxT^gT {deg(Ti)}. We 

first show that MASP with fc = 2 can be solved in O {^\fD n\og{2n / D)') 
time, which is 0(n log n) when D = 0(1) and 0(n^ ®) when D is un- 
restricted. We then present an algorithm for MASP with D = 2 whose 
running time is polynomial ii k = 0(1). On the other hand, we prove 
that MASP is NP-hard for any fixed k > 3 when D is unrestricted, and 
also NP-hard for any fixed D > 2 when k is unrestricted even if each 
input tree is required to contain at most three leaves. Finally, we describe 
a polynomial-time (n/ log n)-approximation algorithm for MASP. 



1 Introduction 

An important objective in phylogenetics is to develop methods for merging a col- 
lection of phylogenetic trees on overlapping sets of taxa into a single supertree so 
that no (or as little as possible) branching information is lost. Ideally, the result- 
ing supertree can then be used to deduce evolutionary relationships between taxa 
which do not occur together in any one of the input trees. Supertree methods are 
useful because most individual studies investigate relatively few taxa [22] and 
because sample bias leads to certain taxa being studied much more frequently 
than others [4] . Also, supertree methods can combine trees constructed for differ- 
ent types of data or under different models of evolution. Furthermore, although 
computationally expensive methods for constructing reliable phylogenetic trees 
are infeasible for large sets of taxa, they can be applied to obtain highly accurate 
trees for smaller, overlapping subsets of the taxa that may then be merged using 
less computationally intense, supertree-based techniques (see, e.g., [7,16,20]). 

Since the set of trees which is to be combined may in practice contain con- 
tradictory branching structure (for example, if the trees have been constructed 
from data originating from different genes or if the experimental data contains 
errors), a supertree method needs to specify how to resolve conflicts. In this pa- 
per, we consider maximum agreement supertrees. The intuitive idea is to identify 
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Fig. 1. Let T be the tree on the left. Then T \ {a, c, d, h} is the tree shown on the right. 

and remove a smallest possible subset of the taxa so that the remaining taxa can 
be combined without conflicts. In this way, one would get an indication of which 
ancestral relationships can be regarded as resolved and which taxa need to be 
subjected to further experiments. We formalize the above as a computational 
problem called the maximum agreement supertree problem (MASP). 

Further motivation for studying maximum agreement supertrees comes from 
the relation to a well-studied problem known as the maximum agreement subtree 
problem (MAST) in which the input is a set of leaf-labeled trees and the goal is 
to compute a tree contained in all of the input trees with as many labeled leaves 
as possible. Our results in this paper complement those previously known for 
MAST. The computational complexity of MAST has been closely investigated 
(see Section 1.2), motivated by the practical usefulness of maximum agreement 
subtrees. For example, maximum agreement subtrees can be used not only to 
identify small problematic subsets of taxa during phylogenetic reconstruction, 
but also to measure the similarity of a given set of trees [9,11,19] or to estimate 
a classification’s stability to small changes in the data [11]. Moreover, MAST- 
based algorithms have been used to prepare and improve bilingual context-using 
dictionaries for automated language translation systems [8,21]. 



1.1 Problem Definitions 

Let T be a tree whose leaves are labeled by a set S. We say that T is distinctly 
leaf-labeled by S if no two leaves in T have the same label. Below, each leaf 
in such a tree is identified with its corresponding label in S. Given a rooted, 
unordered, distinctly leaf-labeled tree T and a set S', the topological restriction 
of T to S' (denoted by T \ S') is the tree obtained by deleting from T all nodes 
which are not on any path from the root to a leaf in S' along with their incident 
edges, and then contracting every edge between a node having just one child and 
its child (see Fig. 1). For any tree T, denote its set of leaves by A{T). 

Let T = {Ti, T 2 , ..., Tfe} be a set of rooted, unordered trees, where each is 
distinctly leaf-labeled and where the sets A(Ti) may overlap. A total agreement 
supertree of T is a, tree Q such that Q is distinctly leaf-labeled by Ut sT m) 
and Q \ A{Ti) is isomorphic to Ti for every Ti G T- Note that two or more 
trees in T may contain conflicting branching information, in which case a total 
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agreement supertree of T does not exist. The total agreement supertree prob- 
lem (TASP) is: Given a set T of distinctly leaf-labeled, rooted, unordered trees, 
output a total agreement supertree of T if one exists, otherwise output null. 

When T = {Ti, T 2 , ..., Tfc} is specified, we write S' = lJ.j,.g. 7 - rl(Ti) and call S 
the leaf set ofT- For any S' C S, we let T| S' denote the set {Ti | S', T 2 | S', ..., 
Tfc I S'}. If there exists a total agreement supertree Q of T\ S' then we say that 
S' is consistent with T and call Q an agreement supertree of T. A maximum 
agreement supertree of T is an agreement supertree of T with as many leaves as 
possible. The maximum agreement supertree problem (MASP) is: Given a set T 
of distinctly leaf-labeled, rooted, unordered trees, output a maximum agreement 
supertree of T. An agreement subtree of T is a, tree U such that for some S' C S 
it holds that U is distinctly leaf-labeled by S' and Ti \ S' is isomorphic to U for 
every Ti G T- A maximum agreement subtree ofT is an agreement subtree of T 
with the maximum possible number of leaves. The maximum agreement subtree 
problem (MAST), also referred to in the literature as the maximum homeomor- 
phic subtree problem (MHT), is to find a maximum agreement subtree of T. 

Throughout this paper, we let n denote the cardinality of the leaf set and 
k the number of input trees, i.e., n = |lJ.j..g.y- yl(Ti) | and k = \T\ in the problem 
definitions above. We let D = max 7 '.g 7 -{deg(Tj)}, where deg(Ti) is the degree^ 
of Ti. We assume that none of the trees in T have a node with degree 1, so that 
each tree contains 0{n) nodes. Note that if we are given a subset S' of S which is 
consistent with T, then we can efficiently construct a total agreement supertree 
of T I S' using the algorithm for TASP by Henzinger et al. [16] (see also Lemma 7 
in Section 5). Hence, we focus on the subproblem of MASP of computing a 
maximum cardinality subset S' of S such that S' is consistent with T. 

A rooted triplet is a distinctly leaf-labeled, binary, rooted, unordered tree with 
three leaves. The unique rooted triplet on {a,b,c} in which the lowest common 
ancestor of a and 6 is a proper descendant of the lowest common ancestor of a 
and c (or equivalently, where the lowest common ancestor of a and 6 is a proper 
descendant of the lowest common ancestor of b and c) is denoted by ({a,5},c). 



1.2 Previous Results 

Gomprehensive surveys of existing methods for constructing supertrees can be 
found in [4,20,22]. Below, we mention some known results related to MASP. 

Aho, Sagiv, Szymanski, and Ullman [1] presented an algorithm which can 
be used to solve TASP in 0{kn) time when all trees in T are rooted triplets. 
Several years later, Henzinger, King, and Warnow [16] showed how to modify 
the algorithm to solve TASP for any T in min{0(Nn°’®), 0{N n^logn)} 
time, where N = total number of nodes in T. In contrast, 

the analog of TASP for unrooted trees is NP-hard, even if all of the input trees 
are quartets (distinctly leaf-labeled, unrooted trees each having four leaves and 
no nodes with precisely two neighbors) [23]. A polynomial-time algorithm for 

^ The degree of a node rt in a rooted tree is the number of children of u. The degree 
of a rooted tree T is the maximum degree of all nodes in T. 
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computing an unrooted total agreement supertree if one exists when all k input 
trees are binary and k = 0(1) was given by Bryant in [6]. 

The computational complexity of MAST has been studied extensively 
(e.g., [3,5,8,9,10,11 j14,15,19j 24]). Today, the fastest known algorithm for 
MAST for two trees, invented by Kao, Lam, Sung, and Ting [19], runs in 
O {^fD n\og{2n / D)) time, which is O(nlogn) when D = 0(1) and 0(n^'®) when 
D is unrestricted. 

Amir and Keselman [3] considered the case of A: > 3 input trees. They proved 
that MAST is NP-hard for three trees with unrestricted degrees, but solvable in 
polynomial time for three or more trees if the degree of at least one of the trees 
is bounded by a constant. For the latter case, Farach, Przytycka, and Thorup [9] 
gave an algorithm with improved efficiency running in 0{kn^ time, where d 
is an upper bound on at least one of the input trees’ degrees; Bryant [5] proposed 
a conceptually different algorithm with the same running time. 

Hein, Jiang, Wang, and Zhang [15] proved the following inapproximability 
result: MAST for three trees with unrestricted degrees cannot be approximated 
within a factor of 2*°® " in polynomial time for any constant J < 1, unless 
NP C DTIME[2P°^y^°s"]. G§sieniec, Jansson, Lingas, and Ostlin [14] proved that 
MAST cannot be approximated within a factor of for any constant e where 
0 ^ ^ < i polynomial time unless P = NP, even for instances containing 
only trees of height 2, and showed that if the number of trees is bounded by a 
constant and all the input trees’ heights are bounded by a constant then MAST 
can be approximated within a constant factor in 0(n log n) time. 

A problem related to MASP and MAST is the maximum refinement subtree 
problem (MRST). Its goal is to construct a tree W with A{W) C S which max- 
imizes |A(JF)| such that for each Ti G T, Ti \A{W) can be obtained from W 
by applying a series of edge contractions. MRST is NP-hard for A: = 2 if D is 
unrestricted [15] but solvable in polynomial time if A: = 0(1) and D = 0(1) [12]. 
Another related problem is the maximum compatible subset of rooted triplets 
problem (MCSR) in which the input is a set T of rooted triplets and the ob- 
jective is to find a T' C 7^ of maximum cardinality such that there exists a 
total agreement supertree of Tb MCSR is NP-hard [5,18]; two polynomial-time 
approximation algorithms for MCSR were given in [14]. 

1.3 Our Results and Organization of Paper 

In Section 2, we make use of known positive and negative results for MAST to 
obtain an efficient algorithm for MASP restricted to A; = 2 and an NP-hardness 
proof for MASP restricted to any fixed k > 3, respectively. The algorithm for 
k = 2 runs in O {^/~D n\og(2n / D)^ time, which is O(nlogn) when D = 0(1) and 
0(n^'®) when D is unrestricted. Then, in Section 3, we present a more complex 
MAST-based algorithm for solving MASP with O = 2. It runs in 0(k(2n)^^ ) 
time, which is polynomial when k = 0(1). In Section 4, we prove that MASP is 
NP-hard even if all of the input trees are required to be rooted triplets (i.e., D = 2 
and k is unrestricted). Finally, in Section 5, we describe a simple polynomial-time 
approximation algorithm for MASP which is guaranteed to find an approximate 
solution with at least times the number of leaves in an optimal solution. 
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2 Preliminaries 

We first investigate the close relationship between MASP and MAST. 

Lemma 1 . For any set T = {Ti, T 2 , T^} of distinctly leaf-labeled, rooted, 
unordered trees such that A{Ti) = A{T 2 ) = ... = A{Tk), an optimal solution to 
MASP for T is an optimal solution to MAST for T and vice versa. 

Proof. Write S = A{Ti) = A{T 2 ) = ... = A(Tfc), let Q be any agreement 
supertree of T, and let S' = A{Q). Then, by definition, Q \ A{Ti \ S') = Ti\S' for 
every Ti G T. Now, A(Tj | S") = S C\ S' = S' , so Ti \ S' = Q | = Q for every 

Ti G T, which means that Q is an agreement subtree of T. Conversely, let U be 
an agreement subtree of T whose leaves are distinctly labeled by some set S'. For 
every T, G T, we have T,\S' = U. Then U \ A{T, \ S') = (T, | S') \ A{Ti \ S') = 
Ti I S' for every Ti G T, i.e., U is an agreement supertree of T. □ 

Theorem 1 . MASP with k = 2 can be solved in 0['/D nlog{2n/D)) time. 

Proof. Given an instance T = {Ti,T 2 } of MASP with fc = 2, let L = 
A{Ti) n A{T 2 ) and run the algorithm of Kao, Lam, Sung, and Ting [19] on 
the instance T j A to obtain a maximum agreement subtree U of T j A. This 
takes O {\/1D n\og{2n / D)) time. By Lemma 1, U is also a maximum agree- 
ment supertree of Tj A. Next, for every leaf which appears in exactly one of T\ 
and T 2 , insert it into U according to its position in Ti or T 2 . More precisely, let 
X = L\A{U) and first compute = Ti | (A(Ti)\A) and T^ = T 2 \ {A{T 2 )\X) 
in 0{n) time. For any node u G U, let T[{u) and A 2 (^) be the node in T{ 
and T 2 respectively corresponding to u. Construct a tree Q as follows: initially, 
set Q = T{, then for each edge (u,v) of U, where we assume u is the parent 
of v, replace the edge in Q between T{{v) and its parent with the path in T 2 
between T^iv) and T^iu). Q can be constructed using a total of 0{n) time. It is 
straightforward to show that Q is a maximum agreement supertree of T. □ 

The running time given in Theorem 1 is 0(n log n) for two trees whose degrees 
are bounded by a constant and 0{n^'^) for two trees with unrestricted degrees. 

The NP-hardness of MAST for any fixed fc > 3 when D is unrestricted [3] 
together with Lemma 1 yield the following theorem (in fact. Lemma 1 can be used 
to show that the inapproximability results of [14] and [15] for MAST mentioned 
in Section 1.2 hold for MASP as well). 

Theorem 2. For any fixed k >3, MASP with unrestricted D is NP-hard. 

3 A Polynomial-Time Algorithm for D = 2, k = 0(1) 

In this section, we show how MASP restricted to D = 2 can be reduced to MAST 
for a set of k distinctly leaf-labeled binary trees having 0((2n)* ) leaves.^ Hence, 
we can solve MASP with A) = 2 in polynomial time if k = 0(1). 

^ The proofs and figures in this section have been omitted due to space constraints. 
They can be found in the full-length version of our paper. 
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Without loss of generality, assume that every a € S appears in at least two 
trees in T. (If a appears in exactly one tree in T', we can obtain a maximum agree- 
ment supertree of T as follows: (1) Remove a from T; ( 2 ) compute a maximum 
agreement supertree T' for the modified T ; and (3) insert a into T' according to 
its position in the original 'T, as described in the proof of Theorem 1 above.) 

MASP is first transformed to MAST for non-distinctly leaf-labeled trees; 
then, the latter problem is transformed to MAST. Here, by an agreement subtree 
of a set TZ = {R\,R 2 ,. ■ ■ ,Rk\ of non-distinctly leaf-labeled trees, we mean a 
distinctly leaf-labeled tree which is a homeomorphic subtree of every Ri G TZ. 

We now describe our transformation from MASP to MAST for a set TZ = 
{i?i, i ?25 ■ • • ) -Rfc} of non-distinctly leaf-labeled binary trees. To obtain each i?p 

1. Set Ri,o = Ti. 

2. For j = 1 to k, do 

a) Let L = A{Tj) \ U/ 6 {i,...j-i}u{*} MTf) and let U = Tj\L. 

b) Initially, set Rij = Rij-i- Generate \Rij-i\ — 1 copies of U and attach 
one to every edge of Rij- Let r be a new node having the current Rij 
and another copy of U as its two subtrees, and make r the root of Rij. 

3. Set Ri = Ri^k- 

Based on the above construction, for every i, any label in A(Ti) appears 
exactly once in Ri, and R is a homeomorphic subtree of Ri. Also, TZ satisfies: 

Lemma 2. For every Ri G TZ, the number of leaves in Ri is at most (2n)* and 
the height of Ri is at most 2^n. 

Lemma 3. For any tree X which is distinctly leaf-labeled by some S' C S, X is 
an agreement supertree of T if and only if X is an agreement subtree of TZ. 

Next, we transform MAST for the set TZ of non-distinctly leaf-labeled binary 
trees to MAST for a set 'P = {Pi, P 2 , . . . , Pk} of binary trees which are distinctly 
leaf-labeled by | a G S', 1 < t < fc, 1 < 6 * < 7 a[f]}, where 

7 a [i] is the number of occurrences of leaf label a in Ri. 

To describe the transformation, we need some additional notation. For every 
a G S, define a{[bi..di],[b 2 ..d 2 ], . ■ . ,[bk..dk]), where bi < di for all 1 < t < fc, 
to be a rooted caterpillar with 11^=1 + 1 )) leaves labeled (in order of 

non-decreasing distance from the root) by 

a&i72....,6fe+i’ ®di.d 2 ,...A’ «di,d 2 ,...A- Define a([ 6 i..di], [ 62 -^ 2 ], , [bk-dk]) 

as the reversed caterpillar of a([&i..di], [b 2 ..d 2 ], ■ ■ ■ ,[bk--dk]). For every leaf in 
Ri labeled by a, such a leaf is called the jth occurrence of a in Ri if, according 
to pre-order traversal of Ri, it is the jth visited leaf which is labeled by a. 

For i = 1,2, ... ,k, the tree Pi is constructed from Ri by replacing, for every 
a G S, the leaf labeled by a with a caterpillar tree a() or d{) as follows. 

1. Set Pi = Ri. 

2. For every a G S, 

— if Ti is the first tree containing a among T\,T 2 , . ■ . ,Ti, then (in this case. 
Pi contains exactly one a, that is, 7 a [i] = 1) replace a in Pi by the 
caterpillar a([ 1 .. 7 a[l]], . . . , [ 1 .. 7 a[z- 1 ]], [ 1 .. 1 ], [ 1 .. 7 a[t-k 1 ]], . • . , [ 1 .. 7 a[fc]]). 
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~ else for j = 1, 2, . . . ,7a[t], replace the jth occurrence of a in Pi by the 
caterpillar a([ 1 .. 7 a[lj], . . . , [1..7o[t - 1]], [1..7a[t+ 1]], • ■ • , [1..7a[fc]]). 

It is easy to check that each Pi is distinctly labeled by {al^ | 

a G S, I < i < k, 1 < < 7a[t]}- In addition, for every label a G S, there exists 

exactly one tree Pi which contains the caterpillar a() while the rest of the trees in 
V contain caterpillars of the form a(). Below, more properties of V are described. 

Lemma 4. For every Pi, |yl(Pi)| = 0((2n)^^). 

Lemma 5. For any a G S, a MAST ofV has < 2 leaves of the form . 

Lemma 6. For any integer x, the size of the MAST of TZ is > x if and only if 
the size of the MAST ofV is > 2x. 

A MASP of P can now be computed by applying the algorithm of Bryant [5] 
or Farach et al. [9] (see Section 1.2) to V. Since the number of leaves in V is less 
than (2n)^ and all trees are binary, we obtain the main theorem of this section. 

Theorem 3. Given a set of k binary trees T which are labeled by n distinct 
labels, their maximum agreement supertree can be computed in 0{k{2nY^ ) time. 

4 MASP with £> = 2 Is NP-Hard 

Theorem 2 states that MASP is an NP-hard problem for any fixed fc > 3 when 
D is unrestricted. We now show that MASP remains NP-hard if restricted to in- 
stances with D = 2 but where k is left unrestricted. In fact, we prove that MASP 
is NP-hard even if all of the input trees are required to be rooted triplets. Our 
NP-hardness proof consists of a polynomial-time reduction from the independent 
set problem which is known to be NP-hard (see, e.g., [13]). 

The independent set problem 

Instance: An undirected graph G = (V, E) and a positive integer I. 

Question: Is there a subset V of V with \ V'\ = I such that V is an independent 
set, i.e., such that no two vertices in V are joined by an edge in El 

The maximum agreement supertree problem restricted to rooted 
triplets, decision problem version (MASPR-d) 

Instance: A set T of rooted triplets with leaf set S and a positive integer K. 
Question: Is there a subset S' of S with |5"| = K which is consistent with T? 



Theorem 4. MASP is NP-hard even if restricted to rooted triplets. 

Proof. Given an arbitrary instance (G, /) of the independent set problem, con- 
struct an instance of MASPR-d as follows. Let S = Pu{ze|e€i?} and set 
K = I -\- \E\. For each edge e in E, include the two rooted triplets {{a,Ze},b) 
and ({b,Ze},a) in T, where e = {a,b}. Claim: G has an independent set of size I 
if and only if there exists a subset S' of S of size K which is consistent with T. 
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Proof of claim: Suppose there exists an independent set hP in G of size I. 
Then S' = hPU {ze | e G if} with |S"| = I+\E\ is consistent with T since T | S' 
contains no rooted triplets (if T| S' had a rooted triplet {{x, Z[x,y}},v) then x 
and y would be joined by an edge in E and thus could not both belong to W). 

Conversely, suppose there exists a consistent subset S' of S of size K. For 
each {x,y} G E, if Z{x,y] ^ S' but at least one of x and y belongs to S' then 
replace x or t/ in S' by z^^; y^, and if none of x, y, and z^^,y} are contained in S' 
then replace any element in S' belonging to V by Z[x,y} (such an element always 
exists because K > \E\). The resulting set S" will have the form Wu{ze\e £ E} 
with W C V and |5"'| = K, and will still be consistent with T. Next, observe 
that by the construction of T, for each {x, j/} G if at most two of x, y, and 
Z{x,y} can be included in any subset of S which is consistent with E- Therefore, 
for each {x,y} G E, since Z{a;,y} G S" it holds that S" cannot contain both x 
and y. Thus, W is an independent set and \ W\ = K — \E\ = I. 

Hence, MASPR-d is NP-hard and the theorem follows. □ 

5 A Polynomial-Time (n/ log n)- Approximation 
Algorithm 

By the comments preceding Theorem 2, it is highly unlikely that MASP in its 
general form can be solved exactly or even approximated efficiently (say, within 
a constant factor) in polynomial time. However, we can adapt one of Akutsu and 
Halldorsson’s [2] algorithms for the largest common subtree problem to obtain 
the following polynomial-time (n/logn)-approximation algorithm for MASP: 

Arbitrarily partition S into [n/lognj sets -Si, S' 2 , ..., S'[„/iognJ) each of 
size at most [logn] -I- 1. Then, check every subset S'' of every set Si to 
see if S'i is consistent with E, and let Z be one such subset of maximum 
cardinality. Return Z . 

To see that this algorithm always returns a solution with at least times 
the number of leaves in an optimal solution, let S* be a maximum consistent leaf 
subset. Because of the pigeonhole principle, at least one of Si, S 2 , ..., S|„/iog„j 
contains > t—E — r of the elements in S*; thus, IZI > , If ^ , > If ^ . 

To implement the algorithm efficiently, we first note that the deterministic 
algorithm for dynamic graph connectivity employed in the algorithm for TASP 
of Henzinger et al. [16] can be replaced with a more recent one due to Holm et 
al. [17] to yield the following improvement. We then obtain Theorem 5 below. 

Lemma 7. TASP is solvable in Tam[^0{N\o^ n), 0(iV-|-n^ logn)} time, where 
N = gT total number of nodes in E . 

Theorem 5. MASP can be approximated within a factor of in 0{n^ ■ 
log log n) • min{0(A:loglogn), 0{k + logn)} time. MASP restricted to rooted 
triplets can be approximated within a factor of in 0{k + log^ n) time. 

Finally, we remark that MAST can be approximated within a factor of 
in 0{kn^) time using the same technique. 
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6 Concluding Remarks 

Below, we summarize our results on how restricting the parameters D and k 
affects the computational complexity of MASP. Arrows indicate when a result 
follows directly from another by generalization (for example, MASP with D = 2 
and unrestricted k is NP-hard, so the more general case D = 0(1) and un- 
restricted k cannot be any easier) or by specialization (e.g., the algorithm for 
D = 0(1) and k = 2 still works for the more restricted case D = 2 and k = 2). 



MASP 


k = 2 


A: = 0(1) 


k unrestricted 


D = 2 


0(n log n) 

(i) 


0(A:(2n)3'=') 
(Theorem 3) 


NP-hard 
(Theorem 4) 


D = 0{1) 


0(n log n) 
(Theorem 1) 


Open 


NP-hard 

(t) 


D unrestricted 


0(n^-®) 
(Theorem 1) 


NP-hard 
(Theorem 2) 


NP-hard 
(^ or t) 



We have also described a polynomial-time (n/logn)-approximation algo- 
rithm for MASP (Theorem 5). 

It is interesting to note that MASP with D = 2 and unrestricted k is NP-hard 
while on the other hand, MAST with D = 2 and unrestricted k can be solved 
in 0{kn^) time, i.e., in polynomial time, using the algorithm of Bryant [5] or 
Farach et al. [9] (see Section 1.2). This means that for certain restrictions on 
the parameters D and k, MASP and MAST cannot have the same computa- 
tional complexity unless P = NP. Furthermore, although our results indicate 
that MASP is computationally harder than MAST, the maximum refinement 
subtree problem (see Section 1.2) does not seem any easier than MASP since it 
is NP-hard already for k = 2 when D is unrestricted [15]. 

An open problem is to determine the computational complexity of MASP 
with D = 0(1) and k = 0(1). We believe that this case is solvable in polynomial 
time. We would also like to know if the running time of our algorithm for the 
case 0 = 2 and k = 0(1) can be improved. 
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Abstract. The even cycle problem for both undirected [Tho88] and di- 
rected [RST99] graphs has been the topic of intense research in the last decade. 
In this paper, we study the computational complexity of cycle length modularity 
problems. Roughly speaking, in a cycle length modularity problem, given an in- 
put (undirected or directed) graph, one has to determine whether the graph has a 
cycle G of a specific length (or one of several different lengths), modulo a fixed 
integer. We denote the two families (one for undirected graphs and one for di- 
rected graphs) of problems by (S', m)-UC and (S, m)-DC, where m G N and 
S C {0, 1, . . . , m — 1}. (S, m)-UC (respectively, (S, m)-DC) is defined as fol- 
lows: Given an undirected (respectively, directed) graph G, is there a cycle in G 
whose length, modulo m, is a member of S? In this paper, we fully classify (i.e., 
as either polynomial-time solvable or as NP-complete) each problem (S, m)-UC 
such that 0 G S and each problem (S, m)-DC such that 0 ^ S. We also give a 
sufficient condition on S and m for the following problem to be polynomial-time 
computable: (S, m)-UC such that 0 ^ S. 



1 Introduction 

In this paper we study the complexity of problems related to lengths of cycles, modulo a 
fixed integer, in undirected and directed graphs. Given m G N, and S C {0, 1, . . . , m — 
1}, we define the following two cycle length modularity problems. 

(S', m)-UC = {G I G is an undirected graph such that there exists an f G N such that 
i mod m G S, and there exists a cycle of length £ in G}. 

(S, m)-DC = {G I G is an directed graph such that there exists an f G N such that 
£ mod m G S, and there exists a directed cycle of length £ in G}. 

The most basic cases of cycle length modularity problems are the following prob- 
lems for undirected (respectively, directed) graphs: deciding whether a given undirected 
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(respectively, directed) graph has a cycle of odd length, and deciding whether a given 
undirected (respectively, directed) graph has a cycle of even length. We will refer to 
these problems as the odd cycle problem for undirected (respectively, directed) graphs, 
and the even cycle problem for undirected (respectively, directed) graphs, respectively. 
In our notation, these problems are denoted by ({!}, 2)-UC (respectively, ({!}, 2)-DC) 
and ({0}, 2)-UC (respectively, ({0}, 2)-DC). All these four problems are now known 
to be in P. 

The odd cycle problem and the even cycle problem are quite different in nature. The 
reason is that if a closed walk of odd length is decomposed into cycles, then there is at 
least one odd cycle in the decomposition. The corresponding statement for even walks 
and even cycles is not true. Since odd closed walks can easily be found in polynomial 
time, it is easy to detect odd cycles. 

It is well known that an undirected graph has an odd cycle if and only if the graph 
is not bipartite. No such simple characterization is known for the case of even cycles. 
However, Thomassen [Tho88] showed that the family of cycles of length divisible by 
m has the Erdos-Posa property [EP65], and then used results from Robertson and Sey- 
mour [RS86] to prove that the even cycle problem for undirected graphs is in P. In 
fact, Thomassen proved that, for each m G N, ({0},m)-UC is in P. Even though 
Thomassen’s graph minor and tree-width approach to solving the even cycle problem 
is elegant, it means that the algorithm has the drawback of having huge constants in its 
running time. Arkin, Papadimitriou, and Yannakakis [APY91] used a simpler approach 
to give an efficient algorithm for the even cycle problem for undirected graphs. Their 
algorithm is based on their characterization of undirected graphs that do not contain 
even cycles with certain efficiently checkable properties of the biconnected components 
of these graphs. However, unlike Thomassen’s approach, their approach does not seem 
to generalize beyond the ({0}, 2)-UC case to, say ({0}, 3)-UC. A related result from 
Yuster and Zwick [YZ97] shows that, for each k, the problem of deciding if a given 
undirectd graph has a cycle of length 2k is in P. 

It is interesting to note that even though the algorithms for the two odd cycle prob- 
lems (i.e., undirected and directed) are similar, neither of the two algorithms mentioned 
above (namely, Thomassen [Tho88] and Arkin, Papadimitriou, and Yannakakis [APY91]) 
seems to be able to handle the even cycles case for directed graphs. However, Robertson, 
Seymour, and Thomas [RST99] (the conference version of the paper is by McCuaig et 
al. [MRST97]) prove, via giving a polynomial time algorithm for the problem of de- 
ciding whether a bipartite graph has a Pfaffian orientation, that the even cycle problem 
for directed graphs is also in P. We note that Vazirani and Yannakakis [VY89] proved 
the polynomial-time equivalence of the even cycle problem for directed graphs with the 
following problems: 

1 . The problem of checking whether a bipartite graph has a Pfaffian orientation [Kas67] , 

2. Polya’s problem [PoH3], i.e., given a square (0, 1) matrix A, is there a (—1, 0, 1) 
matrix B that can be obtained from A by changing some of the I’s to — I’s such that 
the determinant of B is equal to the permanent of A, and 

3. Given a square matrix A of nonnegative integers, determine if the determinant of A 
equals the permanent of A. 

Until the even cycle problem was shown to be in P by Thomassen [Tho88] (for undi- 
rected graphs) and Robertson, Seymour, and Thomas [RST99] (for directed graphs) and 
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even since then, a lot of interesting research on the even cycle problem and related prob- 
lems led to the study of other cycle length modularity problems. Some problems have 
been shown to be in P, while others have been shown to be NP-complete. We mention 
some that have a close relationship with the problems studied in this paper. As stated ear- 
lier, Thomassen [Tho88] proved that, for each m G N, the problem of deciding whether 
an undirected graph has a cycle of length = 0 (mod m) is in P. That is, for each m G N, 
({0}, to)-UC G P. Arkin, Papadimitriou, and Yannakakis [APY91] study, among other 
problems, the problem of deciding whether a directed graph contains cycles of length 
p (mod m). They prove, via reduction from the directed subgraph homeomorphism 
problem [FHW80], that for all m > 2 and for all p such that 0 < p < m, ({p}, m)-DC 
is NP-complete. They also give a polynomial-time algorithm for the problem of finding 
the greatest common divisor of all cycles in graphs, a problem motivated by the problem 
of finding the period of a Markov chain. Furthermore, they prove that, for all m > 2 and 
for all 0 < p < TO, the problem of deciding whether all cycles in an undirected graph 
are of length p (mod to) is in P. Galluccio and Loebl [GL96] study the complexity of 
the corresponding problem in directed graphs. They prove that for the case of planar 
directed graphs, checking whether all cycles in the input graph are of length p (mod to) 
can be done in polynomial time. 

In this paper, we resolve the complexity (as either in P or NP-complete) of the 
following problems: 

1. {S, to)-DC, where 0^5', and 

2. (S', to)-UC, where 0 G S. 

We prove that each problem in 2 is in P. For 1, we classify each problem as either 
in P or NP-complete, depending only on the properties of S and to: If there exist 
0 < di,d 2 < TO such that di,c ?2 ^ S and {di + d^) mod m G S, then (S, to)-DC 
is NP-complete, otherwise (S, to)-DC is in P. We also prove a sufficient condition for 
(S, to)-UC (with 0 ^ S) to be polynomial-time computable: for each p G S, and for 
each di, d 2 such that 0 < di,d 2 < m and di + d 2 = p (mod to), it holds that either 
di G S or d 2 G S. Note that this condition is exactly the same as that for the “in P” case 
of 1 given above. 

The paper is organized as follows. In Section 2, we introduce the definitions and 
notations that will be used in the rest of the paper. In Section 3, we present results for 
cycle length modularity problems in directed graphs and in Section 4 we present results 
for cycle length modularity problems in undirected graphs. Finally, in Section 5 we 
present some open problems and future research directions. 



2 Definitions and Notations 

In this section we describe the notations used in the rest of the paper. For each finite set 
S, let 1 1 S'! I denote the cardinality of S. 

An undirected graph G is a pair (V,E), where P is a finite set (the set of vertices or 
nodes) and E C V x V (the set of edges) with the following properties. 

1. For each u,v G V, if {u, v) G E, then {v, u) G E. 

2. For each v G V, (v, v) ^ E, that is, self-loops are not allowed. 
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A directed graph G is a pair (V, E), where is a finite set and E C V x V. For each 
graph G, let V (G) denote the set of vertices of G, and let E{G) denote the set of edges 
of G. A walk of length fc in a graph G is a sequence of vertices (uq, u\, . . . , Uk) with 
A: > 1 in G such that, for each 0 < i < k, {ui, ut+i) € E(G). A path is a walk where 
all vertices are distinct. A closed walk in a graph G is a walk (mq, mi, • ■ • , in G such 
that Mo = Uk- A cycle in an undirected graph is a closed walk (ug, ui, . . . , Uk-i,ug) 
of length > 3 such that mq, u\, . . . , Uk-i are k distinct vertices. A cycle in a directed 
graph is a closed walk (mq, u\, . . . , Uk-i,ug) such that ug, ui, . . . , Uk-i are k distinct 
vertices. It should be noted that this definition of a cycle is sometimes called a simple 
cycle. 

3 Cycle Length Modularity Problems in Directed Graphs 

In this section, we study the complexity of cycle length modularity problems in directed 
graphs. Arkin, Papadimitriou, and Yannakakis [APY91] proved that, for each m G N 
and each 0 < r < m, ({r}, m)-DC is NP-complete. In Theorem 1, we generalize their 
result. For each m and S such that 0 ^ S', we give a condition on S and m for (S, m)-DC 
to be NP-complete. Furthermore, we prove that if the stated conditions on S and m are 
not satisfied, then (S, m)-DC is in P. 

Theorem 1. For all m > 1 and S C {1, . . . ,m — 1}, the following is true: 

(i) If there is ap € S, and di ^ S,d 2 ^ S such that 0 < d\,d 2 < m and di + d 2 = p 

(mod m), then (S, m)-DC is IFP -complete. 

(ii) Otherwise, (S, m)-DC is in P. 

Proof. To prove (i), let p G S and di,d 2 fz. S be such that 0 < d\,d 2 < m and 
di+d 2 = p (mod to) . We closely follow the proof of Theorem 1 in Arkin, Papadimitriou, 
and Yannakakis [APY91]. Fortune, Flopcroft, and Wyllie [FHW80] showed that the 
directed subgraph homeomorphism problem is NP-complete for any fixed directed graph 
that is not a tree of depth 1 . In particular, the following problem is NP-complete: 

Given a directed graph G and vertices s and t in V{G), does G contain a cycle 
through both s and tl 

We now specify a polynomial-time function a that reduces this problem to (S, to) -DC. 
Given a directed graph G and vertices s,t G V{G), a{{G,s,t)) outputs the graph G' 
where G' = {V , E') is defined as follows. (Note that in the steps below, we can assume 
that c?i ^ 0 and d 2 0, because if either di or d 2 is equal to 0, then the preconditions 
of (i) cannot be satisfied.) 

1. SetP' := V. SetE' := 0. 

2. For every edge (v, s) G E{G), do the following. 

a) Set V' := V' U {wj | 1 < j < di — 1}, where the wj"s are new vertices. 

b) SetE' := E' O {{v,Wi), {wi,W 2 ), . . . , (w^i-i, s)}. 

3. For every edge (v,t) G i?(G), do the following. 

a) Set V' := V' U {wj | 1 < j < ^2 — 1}, where the Wj"s are new vertices. 

b) SetE' := E' U {(m, rui), (rui, ru 2 ), ■ ■ ■ , (wd^-i, f)}. 
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4. For every edge {v, w) G E{G) such that v,w ^ {s, t}, do the following. 

a) Set V' := V' U {wj | 1 < j < to — 1}, where the Wj’s are new vertices. 

b) Set E' := E' U {(n, wi), (wi, W 2 ), ... ,{w m—1 5 w)}. 

It is easy to see that the cycles in G' have the following properties. 

1 . All cycles in G' going through neither s nor t have length = 0 (mod to) . 

2. All cycles in G" going through s but not through t have length = di (mod to). 

3. All cycles in G' going through t but not through s have length = d ,2 (mod to). 

4. All cycles in G' going through s and t have length = {di + c? 2 ) (mod to) = 
p (mod to) 

Roughly speaking, we replace each edge e G E{G) that ends in s, by a series of 
di edges in G' such that the series of edges ends in s. Similarly, we replace each edge 
e G E{G) that ends in t, by a series of d 2 edges in G' that ends in t. It is clear from 
the construction of G' that there is a cycle through s and f in G if and only if there is a 
cycle through s and f in G'. Since {0,(ii,(i2} nS'=0 and p G S', it follows from the 
properties stated above that there is a cycle through s and f in G if and only if there is a 
cycle of length = p (mod to) in G'. Also, it is clear that G' can be computed from G in 
polynomial time. It follows that (S, to)-DC is NP-hard. Note that, for each S and to, 
(S, to)-DC is clearly in NP. Thus, (S, to)-DC is NP-complete. 

We will now prove (ii). Let to > 1 and SC{l,...,m— l}be such that for all 
p G S and all di,d 2 , if 0 < di,d 2 < m and di + d 2 = p (mod to), then di G S or 
d2 G S. 

We claim that the following algorithm solves (S, to)-DC in polynomial time: 

Input: A directed graph G. 

1 . for each p G S do 

2. if G has a closed walk of length = p (mod to) then accept. 

3. reject. 

Clearly, step 2 can be done in polynomial time. If the algorithm rejects, then obviously 
G is not in (S, to) -DC. To complete the proof of (ii), we will prove the following claim. 

Claim. If G has a closed walk W of length = p (mod to) for some p G S, then G has 
a cycle of length = p' (mod to) for some p' G S. 

Proof. The proof is by induction on the length of W. The claim is certainly true for all 
closed walks W of length 1. Assume that the claim is true for all closed walks W whose 
length is less than k. Suppose G has a closed walk W of length k with k mod m = p 
and p G S. Distinguish the following two cases. 

Case 1: IP is a cycle. 

Then we are done. 

Case 2: W is not a cycle. 

Then there exist £i > 0, £2 > 0, di < to, and d 2 < m such that W can be 
decomposed into a simple cycle G of length £i and a closed walk W' of length £2 
such that £\ = di (mod to), £2 = d 2 (mod to). Since + ^2 = k, it follows that 
£i + £2 = di + d 2 = p (mod to). We know that d\ G S or d 2 G S. If di G S' then 
we are done. If d 2 G S we are done by the induction hypothesis. 
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Thus, this claim holds, and so Theorem 1 holds. □ 

As an immediate corollary, we get that the problem of deciding whether all cycles in a 
directed graph have length = 0 (mod m) is in P. 

Corollary 2. For each to G N, ({1,2,... , to — 1}, to)-DC G P. 

We note that Corollary 2 also follows from the fact that finding the period (greatest 
common divisor of all cycle lengths) of a graph is in P [BV73] (see also [Knu73, 
Tar72]). 

Yuster and Zwick [YZ97] proved that for directed graphs, a shortest odd length 
cycle can be found in time 0(||P|| • ||ii^||)- We show that for all ( S', to) -D C- problems 
satisfying the condition (ii) of Theorem 1, a shortest cycle with length, modulo to, in S 
can be found in time 0(M(||P||) • log ||P||), where M{n) = is the complexity 

of boolean matrix multiplication. For the special case to = 2, S = {1}, the algorithm 
is for dense graphs an improvement over the one given in [YZ97]. 

Theorem 3. For all m > 2 and S C {0, ... , to — 1} with 0 ^ S the following is true: 
If for all p € S, and all c?i, d 2 , such that 0 < c?i, ((2 < to and d\ + d 2 = p (mod to), 
it holds that d\ G S or d 2 G S, then there is an 0{M{\\V\\) -logllPlI) time algorithm 
that computes a shortest cycle C such that the length ofC, modulo m, is in S. 

Proof If the precondition of Theorem 3 holds, every closed walk whose length, modulo 
TO, belongs to S, is a cycle or decomposes into cycles such that the length of at least 
one of these cycles, modulo to, belongs to S. Hence the problem reduces to finding a 
shortest closed walk whose length, modulo to, belongs to S. 

Let G = {V, E), where V = (wi, . . . , f„}. For every r G (0, . . . , to — 1} and 

0 < k < n, we define the boolean matrix ^ by — 1 iff there is a walk of 

length £ from Vi to vj in G with 0 < £ < k and £ = r (mod to). With 0(log n) boolean 
matrix multiplications we can determine kmin, the length of the desired closed walk. 
The value of fcmm equals the smallest k with Aj-yih i) = 1 for some i G (1, • ■ • ,n} 
and r' G S. First, compute the matrices ^ where fc is a power of 2, using the identity 

m— 1 

A2fc,r — \J A A/„ .p_j) V Aj^ j-^ 

i=0 

where A and V stand for boolean matrix multiplication and componentwise ’or’, respec- 
tively. Note that Ai 1 is the adjacency matrix of G, and Ai r 1, ^ zero matrix. 

After that, apply binary search to determine kmin, and a representation of Ak^^^y as 
product of matrices A^ with k being a power of 2. A specific closed walk with length 
kmin (which W6 know, is a cycle) can now easily be found in additional 0(| |P | p) time. 

□ 

4 Cycle Length Modularity Problems in Undirected Graphs 

In this section, we study the complexity of problems (5, to)-UC, for different S and to. 
The case when S = {0} has been shown to be in P by Thomassen [Tho88]. We extend 
Thomassen’s result and prove that for all S such that 0 G S', (S, to)-UC is in P. 
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Theorem 4. For each m, and each S C {0, . . . , m— 1} such thatQ^ S, {S, m)-UC G P. 

The proof of Theorem 4 is an extension of the proof of Thomassen’s result for 
({ 0 }, to)-UC, which in turn is based on the result from Robertson and Seymour [RS 86 ] 
for the /c-disjoint paths problem. We will need the following results related to tree- widths 
for the proof of Theorem 4 . Tree-width is an invariant of graphs that has been a central 
concept in the development of algorithms for fundamental problems in graph theory. 
See [RS 86 ] for a dehnition of tree-width, and [RS 85 ] for a survey on graph minor 
results. We will not dehne tree-widths because the definition is rather involved and for 
the proof of Theorem 4 we need to know only the following fact about tree-widths of 
graphs. 

Theorems ([RS86]) .For each t G N, there is a polynomial-time algorithm for deciding 
whether an undirected graph has tree-width at least t. 

The following theorem shows that, for fixed m, all graphs of sufficiently large tree- 
width have a cycle whose length is a multiple of m. 

Theorem 6 ([Tho88]). For each m, there exists a fm G N such that, for each undirected 
graph G with tree-width at least tm, G contains a cycle of length = 0 (mod to). 

Roughly speaking. Theorem 6 allows us to handle those graphs that have large tree- 
widths. Theorem 8 allows us to handle small tree-widths. 

Definition 7. For each t,m G N, di,d 2 , ■ ■ ■ dk such that, for each I < i < k, di < 

TO, DISJ-PATH^i_^_di,d 2 ....,dG is defined as follows: T)lS3-PATR(^t^^^ai,d2,... ,dk) = 

{(G, Xi,yi, . . . ,Xk,yk\G isan undirected graph such that (a) G has tree width almost 
t, (b) for each i, Xi and yi are vertices in G, and (c) there exist k node-disjoint paths 
Pi, P 2 , . ■ . ,Pk in G such that, for each I < i < k. Pi is a path connecting Xi and yi 
such that Pi has length di (mod to)}. 



Theorem 8 ([Tho88]). Let t,m,di,d 2 , ■ . ■ ,dk G N he such that, for each \ < i < k, 
di < TO. Then, DISJ-PATH^t_^_dj_d 2 ,... ,dk) is in P. 

Proof of Theorem 4 . Let to and S be such that to G N, S' C { 0 , 1, . . . , to — 1}, and 
0 G S. We will now describe a polynomial-time algorithm that decides (S, to)-UC. Let 
G be the input graph. Check, using algorithm in Theorem 5 , if G has tree-width at least 
tm, where tm is as in Theorem 6. If so, then, by Theorem 6, G has a cycle of length 
0 (mod to). Otherwise, G has tree- width at most tm- So, we use Theorem 8 to check if 
G has a cycle of length j such that £ G S. For all distinct vertices vi,V2,V3, V4 in G such 
that {(^1,^2), (w3, ^4)} C E{G), for each f G S, and for each 0 < di,d2 < m — 1 such 
that di d2 2 = £ (mod to), we do the following. Check, using Theorem 8, whether 
there are 2 disjoint paths Pi and P2 , Pi between v\ and V3 of length d\ (mod to) and P2 
between V2 and V4 of length d2 (mod to). If there are such disjoint paths, then there is a 
cycle of length di-\-d2-\-2 = £ (mod to) , namely the cycle consisting of the edges in Pi , 
the edges in P2, and the edges {vi,V2) and (^3, V4). Note that we may be missing cycles 
consisting of 3 nodes or less, but that can be easily handled by checking brute-force for 
all cycles of 3 nodes or less. □ 
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Let us consider the complements of the following cycle length modnlarity problems 
(for fixed m > 2 and fixed 0 < r < to): ({0, 1, . . . , to — 1} — {r}, to)-UC. For any 
TO and r snch that 0 < r < to, these problems asks whether all cycles in the given 
graph are of length = r (mod to). For to = 2 and r = 0, this problem is the odd cycle 
problem in undirected graphs, which as noted in the introdnction is easily seen to be in 
P based on the simple observation that any closed walk of odd length in a graph must 
contain a simple cycle of odd length. For to = 2 and r = 1, this problem is the even 
cycle problem, which is also in P [AP Y 91]. Arkin, Papadimitrion, and Yannakakis in fact 
prove, via using the properties of triconnected components of graphs, that for each to, 
and each 0 < r < to, finding whether all cycles in a graph are of length = r (mod to) 
can be done in polynomial time. 

Theorem 9 ([APY91]). For each to G N, and each r such that 0 < r < m, 
({0, 1, . . . ,TO — 1} — {r},TO)-UC G P. 



Corollary 10. ({1, 2, . . . , to — 1}, to)-UC G P. 

The following theorem is an analog of Theorem l(ii) for nndirected graphs. 

Theorem 11. For all m > 2 and S' C {1, . . . , to — 1}, the following is true: If for all 
p G S, and all d\, d2, such that 0 < d\, d2 < m and d\ + d2 = p (mod to), it holds 
that di £ S or d2 £ S, then (S, to)-UC G P. 

The proof given for the corresponding statement regarding directed graphs does not work 
here. The reason is that closed walks in nndirected graphs need not decompose properly 
into cycles. To see why this is true, consider a closed walk C of length 5 in an undirected 
graph: V1V2V3V4V2V1. Note that even though C is a closed walk of length 5, it is neither 
a cycle nor does it decompose properly into cycles, basically because V1V2V1 is not a 
valid cycle. 

In order to prove Theorem 1 1 , we rednce the problem to the problem of determining 
the period of an undirected graphs, which is solvable in polynomial time by the algorithm 
from Arkin, Papadimitrion, and Yannakakis [APY91]. We need the following lemma. 

Lemma 12. For all m > 1 and S = {a \, . . . , a„} C {0, . . . , to — 1}, 0 G S', the 
following is true: 

If for all di £ S,d2 £ S it holds that {di + ^ 2 ) mod m £ S, then S = {f | 0 < f < 
TO and g\I} for some g with g\m. 

Proof Let S — {0} = {oi, 02 , . . . , Un} Let g = gcd(oi, . . . , a„, to). From number 
theory (see [Apo76]) we know that there exist k\, . . . ,kn, kn+i £ Z, such that 

fciOi + k2tt2 H 1- + kn+im = g. 

For alH, 1 < i < n + 1, let 

k[ = ki + m\ki\. 

Then 

{k'^ai H + k'j^an) mod m = g, (1) 

where k[, . . . , k'^ > 0 . 
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For all di G S, d ,2 G S it holds that (di + ^ 2 ) mod m G S. Hence Eq. (1) implies that 
g G S. Furthermore, sg G S for all s C Z such that 0 < sg < m. 

Since for each \ < i < n, g\ai,\i follows that 

S={E\Q<i < m a.ndg\e}. 

This concludes the proof of Lemma 12 □ 



Proof of Theorem 11. Let S = {0, ... , m — 1} — S'. Lemma 12 implies that 

S={£|0<£ < m and g\£} 

for some g with g\m. Hence 

S = {1\Q < £ < m and g /£}. 

Define 



S' = {!,... ,g-l}. 



Since g\m holds 

X mod m G S < 1 =^ x mod g G S' 

for all a: G N. Hence (S, m)-UC is equivalent to ({1, ... ,g— l},g)-UC. However, 
({!,... , (7 — 1}, 5 )-UC is the set of graphs containing a cycle not divisible by g, which 
is in P since the period of a graph (the gcd of all cycle lengths) can be determined in 
polynomial time [APY91]. This concludes the proof of Theorem 11. □ 

5 Conclusion and Open Problems 

In this paper, we studied the complexity of cycle length modularity problems. We com- 
pletely characterized (i.e., as either polynomial-time computable or as NP-complete) 
each problem (S, m)-DC, where 0 ^ S. We also proved that, for each S such that 
0 G S, (S, to)-UC is in P, and we proved a sufficient condition on S and m for the 
problem (S, to)-UC (0 ^ S) to be in P. We mention several open problems. 

1. Theorem 1 completely characterizes all modularity problems in directed graphs 
when Q ^ S. Robertson, Seymour, and Thomas [RST99] prove that ({0}, 2)-DC is 
in P. In light of these results, it is natural to ask if ({0}, m)-DC G P, for some or 
all m > 2. Also, the complexity of {S, m)-DC such that 0 G S' and m > 2 is still 
open, except for trivial (S = {0, 1, . . . , m — 1}) cases. 

2. Theorem 4 shows that all cycle length modularity problems in undirected graphs 
(S, to)-UC such that 0 G S are solvable in polynomial time. What about the com- 
plexity of the (S, to)-UC problems with 0 ^ S which are not covered by Theorem 
9 or 11? 

3. Theorem 9 shows that, for undirected graphs, the problem of finding whether all 
cycles have length = r (mod m) is in P . What is the complexity of the corresponding 
problem for directed graphs? 
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Abstract. In the paper, we present a procedural semantics for fuzzy 
disjunctive programs - sets of graded implications of the form: 

{hi V ■ ■ - V hn < — 6i & • • • & bm, c) (n > 0, m > 0) 

where hi, bj are atoms and c a truth degree from a complete residuated 
lattice 



L = (L, <, V, A, *,=>,0, 1). 

A graded implication can be understood as a means of the represen- 
tation of incomplete and uncertain information; the incompleteness is 
formalised by the consequent disjunction of the implication, while the 
uncertainty by its truth degree. We generalise the results for Boolean 
lattices in [3] to the case of residuated ones. We take into consideration 
the non-idempotent triangular norm *, instead of the idempotent 
A, as a truth function for the strong conjunction &. In the end, the 
coincidence of the proposed procedural semantics and the generalised 
declarative, fixpoint semantics from [4] will be reached. 

Keywords: Disjunctive logic programming, multivalued logic program- 
ming, fuzzy logic, knowledge representation and reasoning 



1 Introduction 

The complexity of the real world causes that our knowledge about it is unfortu- 
nately ambiguous. The complete and certain description of even a ’small’ part of 
the reality requires much more detailed information than humans or computer 
systems are capable to recognise simultaneously. From the philosophical point 
of view, such the exhaustive description of the world is an unreachable platonic 
ideal. Nevertheless, humans can understand complex real systems thanks to their 
ability of maintaining only a generic comprehension of them and of approxima- 
tive reasoning about them. Generally speaking, the ambiguity in our knowledge 
is of various natures; we could roughly distinguish the incompleteness and uncer- 
tainty of information. Gonsequently, in the knowledge representation and com- 
monsense reasoning, we should be able to handle such incomplete and uncertain 
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information. Attempts to solve many real-world problems bring us to storing 
and retrieving incomplete and uncertain knowledge in deductive databases in 
order to represent an incomplete and uncertain model of the world, and carry 
out a reasonable inference of new facts from this model. Expert systems, cog- 
nitive robots, mechatronic systems, sensor data, sound and image databases, 
temporal indeterminacy are only a few of the fields dealing with incomplete and 
uncertain information. In recent years, disjunctive [1,2,11, etc.] and multivalued 
[9,10,7,8, etc.], annotated [12,5,6, etc.] logic programming have been recognised 
as powerful tools for maintenance of such knowledge bases. 

In the paper, we shall aim to combine both the disjunctive and multivalued 
approaches and provide some formalisation of reasoning with incomplete and 
uncertain information represented by graded implications (disjunctions if m = 0) 
of the form: 



{hi y ■ ■ - y hn < — 6i & • • • & bm, c) {n > 0,m > 0) 
where hi, bj are atoms and c a truth degree from a complete residuated lattice 

L= (L, <,V,A,*,=^>,0,1) 

with a (triangular) t-norm * and its residuum Fuzzy disjunctive programs, 
from which we shall infer incomplete and uncertain information, will be viewed 
as sets of graded implications. 

Similar approaches can be found in [9,10,7,8]. The papers [9] and [10] describe 
minimal model and stable semantics based on t-norms for multivalued disjunctive 
logic programs; while, [7] and [8] provide probabilistic semantics of minimal, 
stable, perfect models, and least model states. In addition, this paper introduces 
a procedural counterpart to the mentioned minimal model semantics for positive 
programs (without negation). 



2 Basic Notions and Notation 

2.1 Predicate Fuzzy Logic 

Throughout the paper, we shall use common notions of predicate fuzzy logic. 
Let C denote a predicate language. 

We shall assume that truth values (degrees) of our fuzzy logic constitute a 
lattice L = {L, <, V, A, *, 0, 1) where 

• L = (L, <, V, A, 0, 1) is a complete lattice; 

• the supremum operator V and the infimum operator A are infinitely distribu- 
tive, i.e. for all K C L, a G L, 

aV /\K = k), aA\/K = \/^^^{aAky, 

• the binary operation * over L is commutative, associative, and non-decreasing 
in both the arguments; its neutral element is 1, i.e. 1 * a = a; 
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Table 1. Truth functions. 



connective 


truth function 


V 


V 


A 


A 


& 


* 


xi — y 


y^x 



• the binary operation ^ over L is non-increasing in the first argument and 
non-decreasing in the second one; and 

• for all a, /?, 7 G L, a * /3 < 7 iff a < /3 7 . 

In other words, X is a residuated lattice with the t-norm * and the residuum ^ 
which is furthermore complete and infinitely distributive. 

The language £ contains the following connectives: V (disjunction), A (con- 
junction), & (strong conjunction), and t — (implication). The truth functions of 
the connectives are defined in Tab. 1 in the usual way. 

An T-interpretation, say 21, for £ is a structure 

(^a, {/® I / G Funcc}, {p® | P G Predc}) 

defined as follows: 

• ZYa yf 0 is the universum (domain) of the interpretation 21 ; 

• a function symbol / G Funcc is interpreted as a function — !• U<n; 

• a predicate symbol p G Predc is interpreted as a function, an T-fuzzy relation, 

pa . ^ 

A variable assignment in 21 is a mapping Vorc — > U% assigning each variable 
an element of the universum U%. Let t be a term and (j) a formula of £. In the 
standard manner [4] , we assign t an element of ZYgi and (j) a truth value of L in 
21 with respect to e, denoted by \\t\\f and ||</)||f, respectively. 

By a graded formula of £ we mean a pair ((/>, c) consisting of a formula (j) of 
£ and of a truth degree c € L. 

We say that a graded formula {4>, c) of £ is true in an il-interpretation 21 
for £ with respect to a variable assignment e in 21 , written as 21 \=e ('?i’, c), iff 
\\4'\\f ^ c. We define the truth value of (f), denoted by ||<('||®, as follows: 

ut = /\m\f I e zs a variable assignment in 21}. 

We say that {(j), c) is true in 21, written as 21 ^ {(f), c), iff \\(j)\\^ > c. 

A graded theory of £ is a set of graded formulae of £. An £-interpretation 
21 for £ is an £-model of a graded theory T, in symbols 21 |= T, iff 21 ^ (i^, c) 
for all {(j), c) G T. 

We say that a graded formula {(f>, c) of £ is a fuzzy logical consequence of a 
graded theory T of £, written as T 1 = (ci, c), iff for every T-interpretation 21 for 
£, 21 h?’ implies 21 h(</>,c). 
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2.2 Graded Disjunctions and Implications 

Let -D = V • • • V be a disjunction of atoms of £. If n = 0, the empty 
disjunction is denoted by □. We put: Unil® = 0, \D\ = n, and Atom{D) = 
{di 1 1 < f < n}. We say that the disjunction D is a disjunctive factor iff there 
do not exist indices i ^ j, ^ < i,j < n, such that di = dj. A disjunction of 
atoms D is called a subdisjunction of another disjunction of atoms D' , written 
as -D C D' , iff Atom{D) C Atom{D'). We often say that D subsumes D' or D' 
is subsumed by D. 

Let C = Cl & • • • & c„ be a conjunction of atoms of £. If n = 0, the empty 
conjunction is denoted by T. We put: ||T||^ = 1, \C\ = n, and Atom{C) = 
{c I 1 < i < n}. 

An implication of atoms of C is an implication of the form D i — C where 
I? is a non-empty disjunction of atoms and C a conjunction of atoms of C. An 
implication of atoms D i — C is said to be a tautology iff Atom{D)r\Atom{C) yf 

0. 

Let D, D' and C, C be disjunctions and conjunctions of atoms, respectively. 
Let A be a set of atoms. We say that 

• D is a disjunctive factor of A iff Z? is a disjunctive factor and Atom{D) = A. 

• D is a disjunctive factor of D' iff D is a disjunctive factor and Atom{D) = 
Atom{D'); 

• G is a conjunctive factor of G' iff G = ci & • • • & c„, n > 0, there exists a 

permutation tt on the set n}, and G' = & • • • & c,r(n) j 

• D < — G is an implication factor of D' i — C iff D is a disjunctive factor of 
D' and G is a conjunctive factor of C . 

For the sake of simplicity, we shall abbreviate the expression a graded impli- 
cation (disjunction) of atoms by a graded implication (disjunction). 

A fuzzy disjunctive program of C is an arbitrary set of graded implications 
of £. 

2.3 Substitutions 

We shall use the standard concepts of substitutions [4]: 

A substitution of £ on a finite set X C Varc is a mapping § : X — >• 
Termc- The domain of d dom(&) = X and rangefd) is the set of all the variables 
of Varc occurring in the terms 'd{x), x G X. The set of all substitutions of £ is 
denoted as Substc- 

Let j) and be substitutions of £. -d' is a regular extension of -d iff 

• dom{'d) C dom(d'), ’d'\dom(^) = 

• ’d'\dom('&')-dom(-&) IS a variable renaming, and rangefd) fl 
rangcfd \dom{d')—dom{'d)) ~ 0- 

Let 4> be an open formula. An open formula is a variant of (f> iff there exists 
a variable renaming p such that </>' = 4>p. 

Let S' be a finite non-empty set of terms or open formulae of £, or tuples of 
them. A substitution d of £ is called a unifier for S iff S9 is a singleton. A unifier 
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0 for S is said to be a most general unifier, mgu, for S iff for every unifier 10 of 
C for S, there exists a substitution 7 of £ such that 'd|„ors(S) = ()\vars{S) ° 7 - 

3 Declarative and Fixpoint Semantics 

In this section, we generalise the declarative and fixpoint semantics for fuzzy 
disjunctive programs proposed in [4]. We proceed from the declarative one: 

Definition 1 (Declarative semantics). Let P be a fuzzy disjunctive program 
ofC. 

T>S{P) = {(D,c) I (D,c) is a graded disjunction of L and P |= (D,c)}. 

Consider a complete lattice L = (L,<,V,A,0,1).A mapping T : L — >• L is 
w-continuous iff for all w-chains X C T{\JX) = \J{T{x) \ x G X}. We denote 
the a-th iteration power of T on 0 as T“. We say that x G L is a fixpoint of T 
iff T{x) = X. By the fixpoint theorem (Knaster, Tarski), if T is w-continuous, 
the least fixpoint of T lfp{T) = T“. 

Denote the set of all graded disjunctions of £ as Disc- Let P be a fuzzy 
disjunctive program of £. We now generalise the hyperresolution operator Cp, 
proposed in [4], which computes over Disc- To sketch how the operator works, 
let us consider Fig. 1. An input of Cp comprises: 

• an implication factor (oi V • • • V a„ < — b\h ■ ■ ■ hbm-, cvq) of a variant of a 
graded implication in the program P and 

• disjunctive factors (cj V • • • V cA V £>*, cvi), i = 1 . - .m, of some variants of 
graded disjunctions in J C Disc- 

Each disjunctive factor is divided into two parts. The first parts enter into the 
unification with the body of the input implication; and the rest together with 
the head of the implication create the output disjunction, which is instantiated 
by a regular extension O' of the most general unifier 9 of the framed columns. 
The output is formed of a disjunctive factor of the output disjunction and of the 
resulting truth value *”fg cVi- We next give a more formal treatment: 

Definition 2. Let P be a fuzzy disjunctive program of C and L C Disc- 

A graded disjunction (D,c) of C is said to be an immediate consequence of L 
and P iff there exist 

• {H i — 61 & • • • & bm, cvq), an implication factor of a variant of a graded im- 
plication in P not being a tautology; 

• {Ci V Di, cVi), i = 1, . . . ,m, disjunctive factors of some variants of graded 
disjunctions in L; 

• the implication and disjunctions do not share variables in common; 

• 9 = mgu{{ bi V ■ - V &i , . . . , 6 ^ V ’y V {Ci , . . . , Cm)), 

|Ci| \Cm\ 

dom{9) = vars{bi, Ci, ... , Cm); 

^ X = {xi I i £ a;} is simply ordered by < that is i < i' Xi < xc- 
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Output 

/ a disjunctive factor of 

V (aiV---Va„ V V ■ ■ ■ V D'^)e' , cvi 

\ \ \ ^ 



bi 


cl 


bi 


cli 




Cl 




C”®m 


' „ ^ 



e 

Input 



(ai V • • • V a„ — bi & • • ■ kbm, cvo) 

(cl V • • • V cii V D\ cvi) 

Fig. 1. Cp-operator. 

• 9' , its regular extension to vars{H i — 6i & • • • & bm) U IJI^i vo,rs{Ci V Di); 
such that 

• D is a disjunctive factor of Atom{H9') U IJl^i Atom{Di9'); and 

• C = CVi. 

The hyperresolution operator Cp is defined as follows: Cp : 

Cp{I) = {(D,c) I {D,c) is an immediate consequence of I and P}. 

From Definition 2, we can easily see that Cp is monotonic and w-continuous. 
So, the fixpoint semantics is based on the least fixpoint Cp of the hyperresolution 
operator: 

Definition 3 (Fixpoint semantics). Let P be a fuzzy disjunctive program of 
C. 

TS{P) = {(D,c) I (D,c) is a graded disjunction of L, 

c < I c') Substc,D'-9 C D}}. 

We close the section with an equivalence theorem: 

Theorem 1 (Equivalence Theorem). Let P be a fuzzy disjunctive program 
of L. For any graded disjunction {D,c) of C, 

P 1= {D, c) if and only if c < \J {c' \ {D' , c') G Cp,d G Substc, Lt'd C D}. 
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Proof. See http : //www. ii . fmpli.uniba. sk/~guller/res04c .ps. □ 

As a consequence, we obtain the coincidence of the proposed semantics: 
Corollary 1. Let P be a fuzzy disjunctive program of C. 

VS{P) = TS{P). 

4 Procedural Semantics 

A most important aim of a resolution procedure is to provide a computed answer 
satisfying some suitable logical conditions for a query and program. In case of 
Horn programs and SLD-resohitioii, a computed answer for a query < — G (G is a 
conjunction of atoms) and from a program P, is a substitution ij such that G'd is 
a common logical consequence of P and G. Concerning disjunctive programs, our 
intention is to compute all disjunctions being common logical consequences of a 
program and query disjunction. For this reason, we should consider not only the 
instantions of the query disjunction, but also some ’suitable’ disjunctions being 
subsumed by these instantions. For example, let P = {p{f{x)) V q{f{x)) V s(a;)} 
and a query be of the form — p{y) V r(y). A computed answer will consist 
of the substitution {y/f(x),x/x} = mgu{p{y),p{f{x))) and of the ’remainder’ 
disjunction q{f{x)) V s(a:). Indeed, the resulting disjunction q{f(x)) V s{x) V 
(p(y) V r{y)){y/ f{x),x/x}, formed of the instantiated query disjunction and of 
the ’remainder’ disjunction, is a common logical consequence of P and the query 
disjunction. 

In order to compute such compound answers, ULSLD-resohition has been 
developed for fuzzy disjunctive programs on Boolean lattices in [3]. The abbre- 
viation ULSLD stands for UnLimited Selection rule driven Linear resolution for 
Disjunctions. To outline how the resolution works, let us consider Fig. 2, where 
an unlimited fuzzy derivation step is drawn. Let Dg be the selected disjunction 
by some selection rule R from the old goal. We choose some sequence of atoms 
di, . . . , d\ff\ from Dg and unify the disjunction di V • • • V d|//| with the subdis- 
junction PI from the head of the input implication. Let 0 be some most general 
unifier. Then the resulting new goal is formed of the new subgoals of the form 
Dg V hi and the remaining ones Di, i ^ q. All the subgoals are instantiated by a 
regular extension O' of 9 to the variables appearing in the old goal and the input 
implication.^ A new part of an unlimited derivation step is a step remainder 
disjunction. In our example, the step remainder disjunction Z9' consists of the 
remaining atoms from the head of the input implication instantiated by 9' . 

Let P be a fuzzy disjunctive program and < — D\, . . . , Dk a goal {Di are dis- 
junctions of atoms). A refutation (computed) answer will consist of a refutation 
answer substitution 9, a refutation remainder {rdi , . . . , rdk) {rdi are disjunctions 
of atoms), and moreover of a refutation truth vector (ci, . . . , Ck) where Ci G L. 
Each remainder disjunction rdi is associated with the corresponding subgoal Di. 
By the remainder disjunctions, we supplement the subgoals instantiated with 

^ They do not share variables in common. 
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Old goal 



Di, . . . , D„-i 



DqJ^\^ ■ ■ ■ ; Dpi 



Implication 

Z V 



di V • • • V d|^| 

e 




bp &i - ■ ■ &ib m 



New goal 



(Dl, . . . , Dq-l, 



New subgoals 



r DqVbl, 

[Dq\/b,„, 

Dq+l, . . . ,Dk)0' 



step remainder disjunction 

ze' 



Fig. 2. Unlimited fuzzy derivation step. 



the substitution 9. Also, each truth value cp is assigned to the corresponding 
subgoal Di so that 

{rdi V D^9,Ci) 



is a common fuzzy logical consequence of the program P and the graded subgoal 
(disjunction) {Di,Ci). 

We now generalise ULSLD-vesohxiion to the case of residuated lattices: 

A finite sequence Di, . . . , D^, k >Q, oi disjunctions of atoms of C is called a 
goal of C. We denote the goal by ^ — Di, . . . , Dk- The empty goal is denoted as 

□ . 

By a selection rule R we mean a function which returns an index q, 1 < q < k, 
for a non-empty goal ^ — D\, . . . ,Dk, k > 1. For the empty goal □, we put 
R{D) = 0. 

Definition 4. For a given graded implication {Im, c) of L, a selection rule R, 
a goal < — Di, . . . , Dk, k > 1, of L where Im and < — Di, . . . , Dk do not share 
variables in common; an unlimited fuzzy derivation step is defined as follows: 

Let 

• Im be of the form Z V H i — b\h ■ ■ ■ hbm, m > 0; 

. R{^D^,...,Du)=q, l<q<k; 

• there exist 9 = mgu{H, diV ■ ■ • V(i|//|) where di € Atoms(Dg), i = 1, . . . , \H\, 
and dom{9) = vars{H, c?i V • • • V d\H\); 

• 9' be a regular extension of 9 to vars{Im,< — Di, . . . , Dk). 
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The resulting new goal is of the form 

i {Di, . . . , Dg_i,Dq y bl, . . . , Dq\J bm, -Dg+1, . . . , D}f)0'-, 

the step remainder disjunction is defined as ZO' . 

The entire derivation step is denoted as: 



D, 



See Fig. 2. 



, Dk I 



ze' I (im,c),e', R 



- {Dl, . . . , Dq-l,Dq V Dq V bm, Dq+l, . . . , Dk)9' ■ 



Definition 5. Let P be a fuzzy disjunctive program of C, Gq = < — I?i, . . . , Dk, 
k>Q, be a goal of L, and R a selection rule. 

A ULSLD -derivation for Go of length n, n > 0, is a finite sequence of goals 
Go,...,Gn satisfying 

rrrii \ (7mj,Ci), R 

for 0 < i < n : Gi \ G^+i 

where {Imi,Ci) is an implication factor of a variant of a graded implication in 

P. 



Definition 6. Let P be a fuzzy disjunctive program of C, Gq = < — Di, . . . , Dk, 

k > 0, a goal of C, Gq, . . . , Gn be a derivation for Go, and R a selection rule. 

IfGn = Q, the derivation is called a refutation for Go . In this case, we define 

• the refutation answer substitution d, dom{d) = vars(Go); 

• the refutation remainder, a tuple of disjunctions {rd\, . . . , rdk) where rdi are 
of L; and 

• the refutation truth vector {cvi, . . . , cvk), cvi G L, 

by recursion on the length n of the refutation: 

• lfn = 0, then Go = □ and k = 0. The refutation answer substitution d = 9, 
the refutation remainder and truth vector are () . 

• If n > 1, then Gq □ and k > 1.^ Let R {< — D\, . . . , Dk) = q, ^ < q Si k, 
and Imo be of the form H i — bi Sz ■ ■ ■ Szbm- Then the first derivation step is 
of the form: 

rmo I (/mo, Co), 0 '„, R 

Go I — > Gi, 



Gl — (Dl, . . . , Dq-l, Dq\/ bl, . . . , Dq V bm, Dq+l, ■ ■ ■ , Dk)0Q 

Denote the rest of the refutation of the length n — 1 as follows: 



(rdf. 

, (c«i,. 




■ ■■.pd' 

<jrr. 


+K+1’- 


. .,rd'u) , 

r, p,F,r 


■ ■+Pg-i+v'g^,. 




,+K+i’- 


..,cvf 



where, by recursion, 

® We may execute derivation steps only for non-empty goals. 
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— , dom{d') = vars{G\), is the refutation answer substitution; 

— {rd'i, . . . , rd'q_i, rd'^^, . . . , rd'^^, , rd'jf) is the refutation remainder; 

— {cv'i , . . . , cv'q_i, cv'q-^, ■ ■ ■ 1 cv'q+ii • ■ • , cv'k) t^e refutation truth vector 

forGi. 

Denote Vr = vars{rdi , . . . , rd^_i, rd ^^, . . . , rd^^, . . . , rd^). Let he a 

regular extension of d' to range{9'if) 3 vars{Gi) such that 

range{d \range{0Q) — vars(Gi)) ^ — 0- 

Then 

— -d = 0'Q\^ars(Go) ° 'd* ; ^ 

— {rdi , . . . , rdk) = {rd \, . . . , rd'^j ^^, . . . , rd'ff) where rd'^ is a disjunc- 

tive factor of rmo"!?* V rd'^^ V • • • V rd'^^; and 

— {cvi,...,cvk) = 

The entire refutation is denoted as 

Go 

In the end, we state that ULSTD-resolution is sound and complete to the 
least fixpoint Cp. 

Theorem 2 (Soundness and Completeness of ULSLD). Let P he a fuzzy 
disjunctive program of L, R a selection rule, and G =< — Di, . . . ,Dk, k >0, be 
a goal of C. 

There exist 

• {Gi V Zi, cvi), i = 1, . . . , fc, disjunctive factors of some variants of graded 
disjunctions in Cp, the goal and disjunctions do not share variables in com- 
mon; 

• e = mgu{{d\V ■■■V d'l(^^^,...,d'jV ■■■Vd'^(y^^),{Gi,...,Gk)), 
where dd^ G Atoms(Di) and dom{9) = vars{d\, . . . , Ci, . . . , Gk); 

• 9' , its regular extension to vars{G) U Ui=i vo,rs{Gi V Zi); 
if and only if there exists a ULSLD -refutation 



so that i9 = 9'l^ars(G)> i"di = Zi9' , and dvi = cVi for i = 1, . . . ,k. 

Proof. See http : //www. ii . fmph.uniba. sk/~guller/res04c .ps. □ 

The procedural semantics can be proposed by means of C/L6'L-D-refutation: 

Definition 7 (Procedural semantics). Let P be a fuzzy disjunctive program 
of T. 
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Using Theorem 2, we conclude: 

Corollary 2. Let P he a fuzzy disjunctive program of C. 

VS{P) = PS{P) =VS{P). 

Thereby we have reached the coincidence of the presented semantics. 

5 Conclusions 

In the paper, we have proposed a procedural semantics for fuzzy disjunctive 
programs. We considered the non-idempotent t-norm * as a truth function for 
the strong conjunction &. The coincidence of the proposed procedural semantics 
and the generalised declarative, fixpoint semantics from [4] has been reached. 
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Abstract. Equality Logic with uninterpreted functions is used for 
proving the equivalense or refinement between systems (hardware 
verihcation, compiler translation, etc). Current approaches for deciding 
this type of formulas use a transformation of an equality formula to the 
propositional one of larger size, and then any standard SAT checker can 
be applied. We give an approach for deciding satisfiability of equality 
logic formulas (E-SAT) in conjunctive normal form. Central in our 
approach is a single proof rule called ER. For this single rule we prove 
soundness and completeness. Based on this rule we propose a complete 
procedure for E-SAT and prove its correctness. Applying our procedure 
on a variation of the pigeon hole formula yields a polynomial complexity 
contrary to earlier approaches to E-SAT. 

Keywords: Equality logic, satisfiability, resolution. 



1 Introduction 

The logic of equality with uninterpreted functions (UIFs) has been proposed for 
verifying hardware [5] . This type of logic is mainly used for proving equivalence 
between systems. When verifying equivalence between two formulas it is often 
possible to abstract away functions replacing them with UIFs. In [I] Ackermann 
showed that the problem of deciding the validity of the formula in equality logic 
with UIFs can be reduced to checking satisfiability of formulas without function 
symbols. These formulas are called equality logic formulas. Bryant et al. [3] 
presented an alternative approach. 

In the past several years various procedures for checking satisfiability of 
equality logic formulas have been suggested. Barrett at al. [2] proposed a de- 
cision procedure based on computing congruence closure in combination with 
case splitting. Goel et al. [6] and Bryant et al. [4] use transformation of equal- 
ity logic to propositional logic by adding transitivity constraints and analyzing 
which transitivity properties may be relevant. In approach called range alloca- 
tion [8,11] a formula structure is analyzed to define a small domain for each 
variable. Then a standard BDD based tool is used to check satisfiability of the 
formula under the domain. Another approach is given in [7]. This approach is 
based on BDD computation, with some extra rules for dealing with transitivity. 
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The problem of deciding whether a given equality formula is satisfiable or 
not we call E-SAT, similar to the way propositional satisfiability is called SAT. 
Analogously to propositional logic, every equality logic formula can be trans- 
formed to an equality formula in conjunctive normal form (E-CNF) such that 
the original formula is satisfiable if and only if the E-CNF is satisfiable. Hence 
we may, and shall concentrate on satisfiability of E-CNFs. 

We present a single-rule inference system for equality logic. Our rule, called 
ER, incorporates some ideas similar to paramodulation and resolution. But it 
is different from them and from other proof systems for first order logic with 
equality such as hyperresolution, etc. Special axioms for equality, i.e. reflexivity, 
symmetry and transitivity axioms, are not required to be added to the original 
set of clauses. Since the equality substitution mechanism is not applied, ER does 
not generate new literals. The rule is sound and complete. 

A decision procedure is an essential component of formal verification systems. 
We propose a procedure based on the ER rule. Since checking satisfiability of 
equality formula is NP-complete it is not expected that a general efficient algo- 
rithm exists. As an example we apply this procedure to a formula parameterized 
by n that is a variation of the well-known pigeon hole formula. It turns out that 
our procedure can prove unsatisfiability of this formula very efficiently, even 
quadratic in n, while standard approaches fail to efficiently prove unsatisfiabil- 
ity of this formula. 

Our paper is organized as follows. In section 2 we give basic definitions. In 
Section 3 we present a general theorem globalizing a local commutation criterion 
for different proof systems. In section 4 we present the ER rule, and we prove its 
soundness and completeness in section 5. The E-SAT procedure is described in 
section 6. In section 7 we prove soundness and completeness of the procedure, in 
section 8 we give an example, and some concluding remarks are in section 9. In 
this version of the paper some details in proofs are omitted; all full proofs can 
be found in [13]. 

2 Basic Definitions and Preliminaries 

Any formula in equality logic, as in propositional logic, can be straightforwardly 
converted to an equivalent E-CNF . In the worst case the size of the result is 
exponential in the size of the original formula. This can be avoided by adding 
extra variables. The well-known Tseitin transformation [12] transforms an arbi- 
trary propositional formula to a CNF in such a way that the original formula 
is satisfiable if and only if the CNF is satisfiable. Both the size of the resulting 
CNF and the complexity of the transformation procedure are linear in the size 
of the original formula. In this transformation new propositional symbols are in- 
troduced, so applying it directly to equality formulas will yield a CNF in which 
the atoms are both equalities and propositional variables. However, if we have n 
propositional variables pi, ■ ■ ■ ,Pn can introduce n-\-l fresh domain variables 
a,x\, . . . Xn and replace every propositional variable pi by the equality Xi « a. 
In this way satisfiability is easily seen to be maintained. Hence we may and shall 
restrict ourselfs to satisfiability of E-CNFs. 
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An E-CNF F is a conjunction of clauses. A clause C is a disjunction of 
literals. The empty clause is denoted by _L. A literal Hs a an atom a; « y or a 
negated atom x ^ y, where x and y belong to a set of variables V. We consider 
X « y and y « x as the same atom. Since conjunction and disjunction are 
associative and commutative, an E-CNF can be viewed as a set of literals sets. 
We denote by Vp the set of all variables which occur in F and hy Lp the set of 
all literals which occur in F. 

A domain D is defined to be a non-empty set. For every domain we define 
an assignment as a function A : V ^ D. For an assignment A we define the 
corresponding interpretation I a on literals by: 

Ia{x « y) = true if A(x) = A(y) 

Ia{x « y) = false if A(x) yf A{y) 

Ia{x 9 ^ y) = ~^Ia{x « y) 

We define Ia{C) — true if Ia{1) = true for some I G C, otherwise Ia{C) = 
false, and Ia{F) = true if Ia{C) = true for any C G F, otherwise Ia{F) = false. 

An E-CNF F is called satisfiable if Ia{F) = true for some assignment A, 
otherwise it is called unsatisfiable. 

Since x « x can be replaced by true, and x 76 x can be replaced by false we 
will consider E-CNFs not containing the literals of the shape x « and x 76 x. 

3 Commutation of Proof Systems 

In this section we present the desired commutation result for arbitrary proof 
systems. It will be used in following sections for proving completeness of the ER 
rule and the decision procedure based on the rule. 

Here a proof system may be anything by which new statements, e.g. clauses, 
may be deduced from existing statements. For such a proof system s we use the 
notation F -Gg G for G = F U {G}, where G is a statement deduced from F by 
the proof system s. 

For every relation — >■ we write — >■* for its reflexive transitive closure, i.e., we 
write F — >•* G if Fo, . . . , F„ exist for n > 0 satisfying 

F = Fo ^ Fi ^ >Fn = G. 

We write F C G if for each C G G there is D G F such that D C G. 
Definition 1. Suppose 

• s is a proof system, 

• F — F' for some sets of statements F and F' , 

• G is an arbitrary set of statements such that G G F. 

We say that s is G-monotonic proof system if there is a set of statements G' 
such that 

• G^l G', and 

• G' E F'. 
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Now we will give the formal definition of local commutation. 

Definition 2. {local commutation) Let si and S 2 be proof systems and 

F'^s^ F". 

We say that a proof system si commutes over a proof system S 2 if for some finite 
n there exist G\, . . . , G„, G such that 

• F ~^S 2 Gi for each i G {1, . . . ,n}, and 

n 

• [}Gi G, where G G F" . 

i=l 

Local commutation means that for two proof systems Si and S 2 , doing one step 
of Si followed by one step of S 2 can be simulated by first doing S 2 and then si. 

Theorem 1. {global commutation) Let s be a union of G-monotonic proof sys- 
tems Si, . . . , s„ such that Si commute over Sj for each i > j. Suppose F — >■* G 
for some F and G; then there are Fi, . . . ,F„ such that 

where F„ C G. 

Proof. The proof is given in [13]. 

4 Resolution for Equality Logic 

An important notion in this paper is a contradictory cycle. 

Definition 3. A contradictory cycle 9 is defined to be a set of literals 

Xi^i X2,..., Xn-1 « Xn, Xi ^ Xn, 

where Xi, ... ,x„ are distinct variables, and n > 1. 

When drawing a graph consisting of the variables from an E-CNF F as 
nodes, equalities contained in F as solid edges, and disequalities contained in F 
as dashed edges, then a contradictory cycle of F corresponds exactly to a cycle 
in this graph in which one edge is dashed and all other edges are solid. For a 
given F-CNF such a graph is easily made, and such cycles are easily established 
by looking for solid paths from one end of a dashed edge to the other end. 

The principle of a contradictory cycle enables us to establish the following 
resolution-based inference rule for equality logic: 

{xi ~ X 2 } U G\, ■ ■ . , {Xn—l ~ Xji'^ U Gji—i, {xi Xn\ U Gji 
ER: Cl U • • • U 

where xi « X 2 ,...,x„-i « x„,xi 9 ^ x„ is a contradictory cycle. The newly 
obtained clause Ci U • • • U C„ is called an ER-resolvent. 

Clearly for every contradictory cycle 9 = {xi « X 2 , . . . , x„_i « x„, xi 76 x„} 
we have a corresponding instance of FR. 

We write F Fg if Fe = F U {G}, where G is an FR-resolvent. We write 
F -^0 Fg if ER is applied for a fixed contradictory cycle 9, and the transition 
from F to Fg in this case is called a 0-step. 
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5 Soundness and Completeness of the ER Rule 

The following theorems show that the ER rule is sound and complete. It is well- 
known that resolution together with paramodulation is complete for FOL with 
equality [9]. Using this fact and Theorem 1 for global commutation property of 
proof systems, we will prove completeness of the ER rule. Proving of soundness 
is straightforward. 

Theorem 2. (soundness) Let F Fe- Then F is satisfiable iff F^ is satisfi- 
able. 

Proof. (=J>) Suppose E is a set of clauses that is satisfiable by some assignment 
A. Let the clauses 



{h} U Cl, . . . , {In} U Cn 

be the members of F, where the set 9 = [li,. . . ,ln} is a contradictory cycle. 
Obviously, the set 9 is unsatisfiable but any its subset is satisfiable. Now, A does 
not satisfy at least one literal from 9. Let us say that Ia{Ii) = false. Then 
Ia{C\) = true, as A satisfies {/i} U Ci. Then A also satisfies Ci U ■ ■ ■ U Cn. 

(4=) Let Fe be satisfiable. Then F is satisfiable as a subset of a satisfiable set 
of clauses. □ 

As we mentioned above, the combination of resolution and paramodulation 
is complete for FOL with equality. 

{ljUC, {y^zjUD 
Paramodulation: {l[y := z\} A C A D 

where l[y := z] is a literal I such that y is substituted by z. 

For equality logic resolution can be presented as follows. 

[xKiy}AC,{x^y}AD 
Resolution: CAD 

Let F' = F A {C}. We wil use the notation F -Ap F' if C was derived from 
F using paramodulation and F F' if C was obtained using resolution. 

It is easily observed that paramodulation and ER rule are C-monotonic. 
In order to use completeness of resolution and paramodulation for FOL with 
equality we have to prove that paramodulation and the ER rule satisfy local 
commutation property, i.e. Definition 2. 

Lemma 1. Paramodulation commutes over ER. 

Proof. The prove is given in [13]. 

Theorem 3. (Completeness) An E-CNF F is unsatisfiable iff there is a deriva- 
tion of the empty clause from F using ER. 
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Proof. (=i>) Suppose F is unsatisfiable set of clauses. Then there is a derivation 
of the empty clause from F using both paramodulation and resolution. Since 
resolution is a particular case of the ER rule for n = 2 than by Lemma 1 
there is a derivation of the empty clause from F at first applying ER rule and 
then paramodulation. Since there is no derivation of the empty clause using just 
paramodulation there is a derivation of the empty clause from F using ER. 

(<J=) Assume that the empty clause can be derived from the E-CNF by ER. 
Then by Theorem 2 the original set of clauses is unsatisfiable. □ 

6 The E-SAT Procedure 

In this section we shall describe the E-SAT procedure and prove its correctness. 

Given a nonempty E-CNF containing nonempty clauses the E-SAT procedure 
forms the set of all contradictory cycles 0 and then repeats the following steps. 

— Choose a contradictory cycle 9 G 0 and remove 9 from 0. 

— Add all possible clauses derived from F by the ER rule over 9. 

We give a precise version of the procedure. 



Procedure E-SAT (F); 
begin 

0 := ContrCycle(E’) ; 
while (0 0) do 

begin 

choose 6 G O', 

0 := 0 \{ 0 }; 

F := FU ER{F,e)' 

If F G F returnCunsatisf iable) ; 

end 

returnCsatisf iable) ; 

end 



Fig. 1. The E-SAT procedure 



In this procedure the function ContrCycle(F) forms the set of all possible 
contradictory cycles. The function ER(F, 9) forms the set of clauses derived from 
F by all possible 0-steps. 

The procedure ends when either the empty clause is derived or no contra- 
dictory cycle is left. If the empty clause is derived the output the procedure 
” unsatisfiable” . If the empty clause is not derived during the procedure the out- 
put is ’’satisfiable”. 

The search space of the saturation-based procedures can grow very rapidly. 
The procedure becomes more efficient when we have criteria to remove redundant 
clauses from the search space. One can use subsumption introduced by Robinson 
[10] for general resolution. 
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Example 1. As an example we have taken the formula from [8] raised during the 
process of translation validation. After abstracting functions and performing the 
Ackermann reduction the following E-CNF is obtained: 



F = (a;i 9^ X2 V X3 9^ a;4 V 2/1 « j/2) A (yi 76 2/3 V 7/2 9^ 2/4 V « Z2) A 
2/1 ~ 2/3 A 2/2 « 2/4 A « Z3 A Z2 9^ 23- 

A current approaches for proving unsatisfiability of the formula require to 
transform it to propositional formula first and then to apply any standard SAT- 
checker. We will show how unsatisfiability of F can be proven by the E-SAT 
procedure. 

(1) 96 X2 V X3 96 X4 V 2/1 « 2/2 

(2) 2/1 9^ 2/3 V 2/2 9^ 2/4 V « Z2 

(3) 2/1 ~ 2/3 

(4) 2/2 « 2/4 

(5) « 23 

(6) 02 96 Z3 

(7) 2/2 9^ 2/4 V « Z2 (2,3) 

(8) 21 « Z2 (4,7) 

(9) T (5,6,8) 

One can see that after three ER-steps the empty clause was derived. 

7 Soundness and Completeness of the Procedure 

We will prove the completeness of the E-SAT procedure. 

Let 6*1, . . . , be all contradictory cycles of an unsatisfiable E-CNF Fq. Based 
on the completeness of the ER rule we will show that there is a finite sequence 
Fi,. . . ,Fn such that for each i G {1, . . . , n} Fi consists of all clauses contained 
in Fj_i and clauses derived from Fi_i in one 0i-step, and F„ contains the empty 
clause. 

At first we will prove the local commutation property. 

Lemma 2. Let Si be a proof system consisting of Oi-step for i € {1, 2}. Then si 
commutes over S2 ■ 

Proof. The proof is given in [13]. 



Theorem 4. Let {9 i, . . . ,0 „} be the set of all contradictory cycles in F. Let 
F — G. Then for some m < n there exist Fi, . . . , Fm such that 



^61 



Fi 



Pm, 



where F^ E G. 
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Proof. The prove follows immediately from Theorem 1 and Lemma 2. □ 



Theorem 5. Let F and G be E-CNFs, 9 he a eontradictory cycle, and G = 
F U ER{F, 9). If G -^0 G' then GQG'. 

Proof. The proof is given in [13]. 



Theorem 6. {Soundness and completeness of the E-SAT procedure) Let F he 
an E-CNF. Then F is unsatisfiable iff the output of the basic procedure is the 
empty clause. 

Proof. (=J>) If F is unsatisfiable then by Theorem 3 there is a derivation of the 
empty clause from F by the ER rule, i.e. F G, where T G G. 

Let {9i, . . . , 9n\ be the set of all contradictory cycles in F. Then by Theorem 
4 for some m < n there are Fi, . . . , Fm such that F — Fi Fm, 

where Fm E G. 

Since T G G, we obtain that T G Fm. By Theorem 5, Fi = Fj_i U 
ER{Fi-i,9i) for each i G {1, ■ ■ ■ ,m}. It implies that the empty clause can be 
derived by the E-SAT procedure. 

(<J=) If there is a derivation of the empty clause by the ER rule then F is 
unsatisfiable by Theorem 2. □ 

8 Example 

As an example we consider a formula that is related to the pigeon hole formula 
in proposition calculus. This formula has been studied in [14] too. Just like the 
pigeon hole formula our formula is parameterized by a number n, it is easily 
seen to be contradictory by a meta argument, and its shape is the conjunction 
of two subformulas. In our formula there are n -I- 1 variables xi, . . . , Xn, y. The 
first subformula states that all values of xi,... are different. The second 
subformula states that the value of y occurs in every subset of size n — 1 of 
{x\, . . . ,x„}, hence it will occur at least twice in {x\, . . . ,x„}, contradicting 
the property of the first subformula. Hence the total formula 

n 

L>n = f\ X, Xj A f\{ y Xi^y) 

i— 1 ,n} ,i^j 

is unsatisfiable as an E-CNF . It is easy to see that is minimally unsatisfiable, 
hence in any proof of unsatisfiability all clauses have to be used. The goal 

now is to prove unsatisfiability of <Pn automatically. 

We applied the bit vector encoding to this formula, i.e., in this formula every 
0 « ru is replaced by f\i{zi gg Wi) for i running from 1 to ]"log(n -I- 1)] and then 
a standard SAT approach is applied for the resulting propositional formula. It 
turned out that both for a BDD-based approach and a resolution based approach 
this is a hard job. For n = 50 or even lower a combinatory explosion comes up. 




538 



O. Tveretina and H. Zantema 



However, by applying the approach introduced in this paper proving unsat- 
isfiability of (Pn can be done polynomial in n. It turns out that all contradictory 
cycles in <Pn are of length 3 and are of the shape 9 ij = {xi « y, xj ^ y,Xi ^ Xj} 
for 1 < t < j < n; the total number of these contradictory cycles is . 

Now we will study the behavior of our procedure consecutively proceeding all 
these contradictory cycles. Write Cj for the clause Vig{a;i x„} i^j ~ V ^^r 
j = 1 , .. . , n, and write Cj„ for the clause obtained from Cj by removing « y, 
for j = 1 , . . . , n — 1. As a first contradictory cycle choose Then by apply- 
ing a 0i^„-step on Ci, C„ and Xi 9^ we obtain the new clause Ci„. Another 
number of 0i.„-steps is possible, but each of them yields a clause in which Cin 
is contained, hence will be removed. Also Ci and Cn are supersets of Ci„ and 
will be removed. So after treating this first contradictory cycle apart from the 
inequalities only the following n — 1 clauses remain: C2, ■ ■ ■ ,C'„_i,C'i„. As a 
second contradictory cycle choose 02, n- Applying a corresponding step on C2, 
Cin and X2 76 Xn yields the new clause C2„. Since this is a subclause of all 
other clauses generated by 02,n-steps, and also of C2, after treating this second 
contradictory cycle apart from the inequalities only the following n — 1 clauses 
remain. CI3, . . . , C^n— i? ^2n- 

This pattern continues after choosing the n — 1-th contradictory cycle 
9 n-i,n apart from the inequalities only the following n — 1 clauses remain: 
Cim C2n, • ■ • ) C'n-i,n- Since now no equality occurs any more involving the vari- 
able Xn, there is no contradictory cycle any more containing the inequalities 
Xi 76 Xn for i = l,...,n— 1. It turns out that the remaining E-CNF is exactly 
<Pn-i- Continuing with consecutively choosing 02,n-i,---, after n — 2 

steps the remaining E-CNF is exactly <Pn-2- This goes on until the remaining 
E-CNF is exactly ^2 consisting of the three unit clauses xi ~ y, X2~ y, x\ 76 X2 
from which the empty clause is derived in one single 0i2-step. 

We conclude that all contradictory cycles were proceeded before the 

empty clause was derived. Surprisingly, after removing redundant clauses, in 
intermediate steps the total number of clauses was never greater than the original 
number of clauses. 



9 Concluding Remarks and Further Research 

We developed a new rule for reasoning with E-CNFs. We proved its soundness 
and completeness. We proposed an algorithm based on this rule for satisfiability 
of E-CNFs, and also proved soundness and completeness of this procedure. So far 
we have this procedure only in a high-level pseudo-code. Many implementation 
details have not yet been considered. However, on a theoretical level we analyzed 
the complexity of our procedure when applied to a particular formula, yielding a 
polynomial complexity, while standard approaches applied to this formula show 
up an exponential behavior. This is quite hopeful for our new approach, and as 
a next step we will implement our procedure and will do experiments with real 
benchmarks. Our procedure can also be modified, i.e., if removing redundant 
clauses is applied in a repeated manner. 
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Abstract. We present a probability logic (essentially a first order lan- 
guage extended with quantifiers that count the fraction of elements in a 
model that satisfy a first order formula) which, on the one hand, cap- 
tures uniform circuit classes such as AC° and TC° over arithmetic mod- 
els, namely, finite structures with linear order and arithmetic relations, 
and, on the other hand, their semantics, with respect to our arithmetic 
models, can be closely approximated by giving interpretations of their 
formulas on finite structures where all relations (including the order) are 
restricted to be “modular” (i.e. to act subject to an integer modulo). In 
order to give a precise measure of the proximity between satisfaction of 
a formula in an arithmetic model and satisfaction of the same formula in 
the “approximate” model, we define the approximate formulas and work 
on a notion of approximate truth. We also indicate how to enhance the 
expressive power of our probability logic in order to capture polynomial 
time decidable queries. 

There are various motivations for this work. As of today, there is not 
known logical description of any computational complexity class below 
NP which does not requires a built-in linear order. Also, it is widely rec- 
ognized that many model theoretic techniques for showing definability in 
logics on finite structures become almost useless when order is present. 

Hence, if we want to obtain significant lower bound results in compu- 
tational complexity via the logical description we ought to find ways of 
by-passing the ordering restriction. With this work we take steps towards 
understanding how well can we approximate, without a true order, the 
expressive power of logics that capture complexity classes on ordered 
structures. 

1 Introduction 

The logical description of many computational complexity classes is based on the 
fact that the possible domains of interpretations must be at least partially or- 
dered. This is certainly the case for logics meant for describing complexity classes 
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below NP, for it is still unknown whether such classes can be described without 
any order, and it is further believed that is not the case (further comments in [5] 
and see also [4]). However, a negative aspect of describing low complexity classes 
by logics with built-in order is that model theoretic techniques for showing in- 
expressibility, such as Ehrenfeucht-Fraisse games and its variations, becomes 
almost useless; thus, in turn, hopeless for leading into significant complexity 
lower bounds. (For an illustration of how difficult is to play Ehrenfeucht-Fraisse 
games on ordered structures see Section 6.6 of [5].) 

This dichotomy with the order had led researchers into exploring ways of 
keeping some order in the models for various forms of extensions of first order 
logic, and yet obtain some significant lower bound results (for example, see [3] 
and [7]). The results presented in this paper are inscribed in that line of re- 
search. We introduce a probability logic CP, which is, essentially, first order 
logic extended with quantifiers that count the fraction of elements in a model 
that satisfy a first order formula. Our definition of the logic CP is inspired on the 
probability logic of Keisler (see [6]), who conceived it as a logic appropriate for 
his investigations on probability hyperfinite spaces, or infinite structures suitable 
for approximating large finite phenomena of applied mathematics. In order to 
suit our need of this logic for describing computability problems, we restrict our 
use of relation symbols to a finite set and mainly of the arithmetic type: addi- 
tion, multiplication and order. With this ability to approximately count and in 
the presence of built-in order, addition, and multiplication, fragments of this CP 
logic are capable of fully describing circuit classes such as AC° and TC°, since 
they coincide with known logics that capture these computational complexity 
classes, for example, first order logic extended with threshold quantifiers. Fol- 
lowing our programme of studying possible ways of reducing the scope of the 
order and other arithmetic relations within our models, we group in the same set 
of witnesses of a formula all those elements that are congruent modulo the value 
of a sublinear function F, and define the concept of an F-modular approxima- 
tion of a finite structure A. The F-modular approximation of A thus obtained 
do not have the order built-in but just approximations of it, and subject to these 
interpretations we do get separation results among fragments of the correspond- 
ing logic CPf, for a particular family of (sublinear) functions F. 

Having satisfied our goal of obtaining inexpressibility results within our prob- 
ability logic under a weaker interpretation of the atomic symbols, we wonder 
how to translate that result to an inexpressibility of the same query (or similar 
query) in the logic with the unrestricted interpretation of symbols (e.g. full linear 
order). As a partial answer to this question we introduce the notion of approx- 
imate formulas and through them we establish a bridge between satisfaction in 
the structures with natural interpretations of the symbols and their correspond- 
ing F-modular approximation. In the last section of the paper we show how to 
extend this probability logic and approximations to capture P. 
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2 Logic of Probability Quantifiers 

We work with finite vocabularies and finite models. A vocabulary or signature 
r is a set of relation symbols and constant symbols. The models for t will be 
denoted by Am, Bn, Ck, etc. where the subscripts refer to the cardinality of 
the model. A logic over the vocabulary r will be denoted C{t). In particular 
FO(r) is the set of first order formulas over r (or r-formulas). The logic we are 
mainly concerned in this paper is the logic of probability quantifiers which we 
define below. Given a natural number m and a set C C {0, . . . , m — 1} we can 
define the natural probability ^m{C) as just the cardinality of C divided by m. 
Likewise, for s > 0, we can define, for every set C C {0, . . . , m — I}'* the natural 
probability Ai^(C') as the cardinality of C divided by m'*. 

Definition 1. For a vocabulary t, we define the logic of probability quantifiers 
(or probability logic) over t, as the set of formulas CV{t) formed as follows: 

Atomic formulas. Formulas of the form R(x,c), where R is a relation symbol 
in T, X is a vector of variables, c is a vector of constants from t, are in 
CV{r). 

Conjunction. If (pi{x),(j) 2 {x) G CV{t) then (fi{x) A (j) 2 {x) is in CP{t). 
Negation. If (f(x) G £P(t) then -xfix) G CP{t). 

Existential quantification. If (j>{x, z) G CV{t) and z is a variable not appear- 
ing in X, then 3z<f>{x, z) G CP(j). 

Probability quantification. Fix a rational number r, 0 < r < 1. If (p{x,z) G 
CP{t) and z is a variable not appearing in x, then 

{P{z) > r)(f>{x, z)and {P(z) > r)4>{x,z) are in CV{t). 

We define the following abbreviations: {P{z) < r)<j)(x, stands for ~^{P{z) > 
r)(j){x,z), and (P{z) < r)(j){x,z) stands for -•{P{z) > r)(p{x,z). Likewise 
yz(j){x, z) stands for -•3z-'(j){x, z) and (fW tp stands for -'{-'(j) A 

We define the interpretation of the formulas in CV{t) in a finite structure Bm 
(m G N) by induction in formulas, with the usual interpretations for conjunction, 
negation and existential quantifier. The interpretation for a formula {P{z) > 
r)(p{x,z) in Bm is as follows: 

Bm h iP{z) > r)4i(a, z) iff p,m{{z < m : Bm \= (p{d, z)}) > r 

Likewise, the interpretation of the formula {P{z) > r)(f>{x,z) is as follows: 

Bm h > x)(j){a, z) iff pim{{z <m\Brn\= (p{d, z)}) > r 

Observe that under this interpretation, ->{P{z) > r)<f>(x,z) is equivalent to 
(P(z) > 1 — r)-'(j){x, z), and -•{P{z) > r)(p{x,z) is equivalent to {P{z) > 
1 — r)-i4>(x, z). 

By £P we denote the union of all probability logics CV{t) taken over all 
finite vocabularies. We shall also deal with the following fragments of CP'. 
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Definition 2. Let r be a finite vocabulary. Let ri,r 2 ,...rfe be distinct natural 
numbers. By £P(r)[ri, r 2 , . . . , r^] we understand the smallest subset of CP{t) 
containing the atomic formulas and closed under conjunction, negation, existen- 
tial quantification and the probability quantifiers P{z) > qijjri, P{z) > qijjri 
where i < k and qij are natural numbers such that 0 < qtj < ri. 

We had in mind using this type of logic for describing computational proper- 
ties and for that matter we restrict semantics to finite models and also the kind of 
relation symbols for building our formulas. In general we restrict our symbols to 
be numerical (in a sense as explained in [5]), and in particular we fix throughout 
this paper the vocabulary P = {©, 0, 0, 0, 1}, where ©,© are ternary relation 
symbols and <1 is a binary relation symbol and 0 and 1 are constant symbols. 
Furthermore, we fix a generic vocabulary P^ that contains P and a set 
of other numerical relation symbols and a set other constant symbols. 

We define the arithmetic structures over P^ as the finite structures Am of the 
form: Am = ({0, 1, ... m - 1},©,©, O, 0, l) , where the rela- 

tion symbols ©, ©, <1 are interpreted as the usual addition, multiplication and 
order in the set {0, 1, ... m — 1}. 

We will refer to the probability logic restricted to finite structures that are 
arithmetic as CP a- The following examples show that the logic CP a contains 
fragments that are relevant to Descriptive Complexity Theory. 

Example 1. Let FO{P) be the first order logic over P and consider the inter- 
pretation of the symbols in T as natural addition, multiplication and linear 
order. It is shown in [I] (see also [5]) that this logic captures the complexity 
class DLOGTIME-uniform AC°, where AC° is defined as the class of problems 
accepted by polynomial size, constant depth circuits with unbounded fan-in. 
This logic corresponds to the smallest subset of CP a that contains the atomic 
T-formulas and is closed under A, -> and 3z . 



Example 2. Let FO{P) + M be the first order logic over P with the interpre- 
tations of the symbols in P fixed as in the previous example, extended with 
the majority quantifier M which is defined as follows: If 4>{x, z) is a formula 
with one free variable z, then {Mz)4>(a,z) is a well defined sentence, which is 
true if and only if 4>(a, z) is true for more than half of the possible values for 
z. It is shown in [I] (see also [5]) that this logic captures the complexity class 
DLOGTIME-uniform TG°, where TG° is the class of problems accepted by cir- 
cuits of polynomial size, constant depth and unbounded fan-in threshold gates 
(gates which counts its Boolean inputs of value 1 and compares the total with 
some prefixed number to determine its output). Note that this logic is the frag- 
ment of CP A that contains the atomic T-formulas and is closed under A,-i, 3z 
and the quantifier P{z) > that is CP{P)\f2]. 

Our purpose is to approximate the expressive power of arithmetic relations 
occurring naturally in finite model theory by arithmetic relations that are “wea- 
ker” yet perform better under definability tools such as Ehrenfeucht-Fraisse 
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games. Our choice of candidates for relations to approximate the natural arith- 
metic relations are those that are “modular” in a number theoretic sense. 

By a =q b we mean that the number a is congruent to the number b modulo 
q. Furthermore, given a = (oi, . . . , at) and b = (bi, . . . , bk) two vectors of natural 
numbers of equal length, we write a =qb a,s an abbreviation of oi =q bi, 02 =q 
62, • . • , cik =q bk- Also, whenever we write a < m for some number m, we mean 
that oi < TO, . . . , Ofc < m. 

We understand that a function F : N — >■ N is sublinear, if for every natural 
TO > 0, 0 < F{m) < TO 

Definition 3. Fix a sublinear function F, a formula 9{x) G CP{F^) and a F+- 
model Bm- The formula 0(x) is F-modular in Bm iff the following condition 
holds: 

— For every d,b < m, ifd =F(m) b then (Bm H iff Bm |= 9{b) ). 

We will say that a collection of formulas {0i{x)Yi=\ Q £F(F+) is F -modular in 
Bm iff every formula 9i is F -modular in Bm- 

The next lemma states that modularity is preserved by the logical operations 
and quantification of CV{F'^)- The proof is an easy induction on formulas. 

Lemma 1. If the collection of atomic F~^ -formulas is F -modular for a structure 
Bm then every formula in CP{FY is F -modular for Bm- ^ 

The direct consequence of the above lemma is that the F-modularity of the 
formulas in CV{F~^) in a model Bm depends only on the modularity of the 
interpretation of the relation symbols in Bm- Because of this fact, every model 
where all the interpretations of the relation symbols are F-modular will be called 

an F-modular structure. 

Remark 1- For every natural numbers e and f > 0 we understand [e]/ to 
be the reminder of dividing e by /. For any vector of natural numbers d = 
(oi, 02, . . . , ad), we understand by [o]/ the vector ([oi]/, [02]/, • . • , [ad]/)- 

Definition 4. Fix a sublinear function F and an arithmetic structure Am - The 
F -modular approximation of Am is a structure 

-^rn “ ({0,1,...,TO — !},©,(8>,<l, 0, 

such that for every o, 6, c, oi, . . . , o^ < to, 

~ iff Am j= ®( [a] F(?n) ) [^] F(m) 5 [c] F(m) ) ■ 

h©(a,6,c) iff A m h ®([a]F( m) 5 [b]F{ m) j [c]f( m) ) ■ 

1= Rg (gi , . . . , dr ) 1^ ( [^l] F( m) 5 ■ ■ ■ 5 [Or]F(m))- 

It is easy to see that for every arithmetic structure Am, the structure A(( is 
F-modular. We also remark that for every s, for every relation symbol Rs, the 
set {(ai, . . . , Or) < TO : Am H Af (oi, . . . , Or)} and the set {(ai, . . . , o^) < to : 
Am 1= Rs{ai, . - . ,Qr)} coincide in the set {(ai,...,Or) : oi,...,Or < F(to)}. 
These two remarks justify the name of F-modular approximation of Am- 
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3 Modular Logics 

Here is an example of a class of sublinear functions with some nice properties. 
These functions will play an important role in the rest of this paper. 

Example 3. Fix a natural n > 0. For every natural number m, let t, r be the 
unique natural numbers such that m = tn + r and 0 < r < n. Define the 

{ tn if m > n 

. 

1 otherwise 

For every n, g„ is sublinear. Furthermore, limm_,.oo = 1- Also, for 

every n, is first order definable in the following sense: there exists a formula 
0n(x) € FO with built-in order, addition and multiplication such that for every 
arithmetic model Am, for any a < m, Am |= 6*n(a) iff a+\ = gn{m). Here is why: 
Note first that in every Am it is possible to capture the property that x is the 
maximal element with a formula Max{x) G FO({©,(g), <1,0, 1}) that says that 
-f3z(B{x, l,z). Likewise, we can say that “the size of the model = tn-|-r”, with r < 
n < (size of the model), by a formula DIVSIZE{t,n,r) G F’Od©, ©, <, 0, 1}) 
that says that there exists x such that Max{x) and 

0 < r < n and x = tn+ (r — 1) or 
r = 0 and x = {t — l)n + {n — 1). 

It follows then that the statement (/„(size of model) = h + 1, for n < m, is 
definable in the models Am by a formula in FO{E) that says that: 

G{h, n) := 3t, r, z{DIVSIZE(t, n, r) A 

[(©(/i, 1, z) A -• © (0, 0, r) A ©(t, n, z)) V (©(0, 0, r) A Max{h))] 

For the case when n = m, we know that gn(jn) = m in which case we can define 
h as Max{h). Finally, if m < n we know that gn{fn) = 1 and we can define 

/i = 0. 

Recall that we refer to the probability logic restricted to finite structures that 
are arithmetic as CP a- The related logic restricted to modular approximations 
of arithmetic structures is formalise below. 

Definition 5. We denote by CPp the probability logic restricted to structures 
that are F-modular approximations of arithmetic structures, for F a sublinear 
function. Likewise, by FOp we understand the smallest fragment of CP p that 
contains the atomic formulas and is closed under 3, -■ and A. Similarly, we define 
CPp[ri , . . . , Tfc] as the smallest fragment of CPp that is closed under 3, A and 
{P{z) > gijjrf) and {P{z) > Qij/ri) for i < k and natural numbers 0 < qij < ri. 
In particular, we define the modular probability logic 

PPmod = [J PPg„ ■ 

nSN 
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Likewise, we define 

FOmod = U FOg^ and CP mod [ri,...,rk]= |J CP [ri,...,rk] ■ 

nSN nSN 

Note that the logics FOmod, FPmod[i"i, . . . , r^], CP mod do not have built- 
in order nor built-in addition nor built-in multiplication. Instead, for each n, 
FOg ^ , CPg„ [ti, . . . ,Tk], FPg„ have built-in g„-modular approximations of the 
order, addition and multiplication. 

We now show that the expressive power of CP mod (respectively CP MODiri, 
..., Tk], FOmod) is contained in the expressive power of CP a (respectively 
CP A [ri, • • • ,rk ] , FO) . Before proceeding, however, we need to clarify the mean- 
ing of a boolean query in the context of modular logics. 

Definition 6. Fix a vocabulary T+ = F U {i?s}s=i U {cw}w=i- ^ boolean 
query for the modular logic CPmod(F^) is a map I : {A^ : m, n G N} — >■ 
{0, 1}, with the additional property that for every 1 < n\ < U 2 , for every m > U 2 , 
= I{Am^). We say that a boolean query is expressible in CP mod{F'^) 
(respectively FOmod{F'^)) iff there exists a sentence 6 G CP{F^) (respectively 
FO) such that for n G N, for every arithmetic structure Am with m > n, 
I{A)^) = 1 iff A)^ h 0. 

The idea behind the above definition of a boolean query for CP mod is to 
capture the notion that a query does not depend on the built-in order or arith- 
metic predicates, instead it depends on notions that remain constant for all the 
approximations A-fij; . For the rest of this section we fix again a vocabulary of the 
form T+ = T U {i?s}s=i U {cw}w=i, where Rg and c^, are numeric relations and 
constants. 

Lemma 2. There exist formulas ADD{x\,X 2 ,x^,y), PRODUCT{xi,X 2 ,x^,y), 
ORDER{x\,X 2 ,y) and for every s, formulas PREDg(x,y) in EO{F'^), such 
that for natural n, for every arithmetic structure Am with m> n, 

— for every a,b,c < m, A^ |= ©(a, b, c) iff Am |= ADD{a, b, c, n). 

— For every a,b,c < m, Atfii ^ ©(a, b, c) iff Am |= PRODU CT{a, b, c, n). 

— For every a,b,c < m, A^ ^ <l{a,b) iff Am H ORDER{a,b,n). 

— For every index s and every a < m, A-fit H Rs(jl) iff Am H PREDg(a,n). 
□ 



The previous lemma allow us to translate modular interpretations to natural 
interpretations . 

Corollary 1. Let B be a boolean query expressible in CP mod- Then this 
query is also expressible in CP a- Likewise, any boolean query expressible in 
FOmod (respectively CP mod[t\, ■ ■ ■ ,Vk\) is also expressible in FOa (respec- 
tively CP A[ri, rk])- □ 
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The logic CP mod is capable of expressing queries as the evenness of the 
cardinality of a set, as we show in the next example. 

Example 4- We claim that there exists a sentence 6*2 in EP{{(B, 0 , <, 0, 1}) such 
that for all n, for every arithmetic structure Am, with m > n, 

Af^ 1 = 6*2 iff TO is even 

To prove this, note first that for every naturals to > n > 1 and every c such that 
g„{m) > c> m- gn{m), 

{j/ < TO : h c<y V©(0,y,c)} = {y <m:c<y< 5 „(to) - 1} 

and this implies that for every c such that gn{Tn) > c > m — gn{m), 

y-m{{y <m-. A^^\= c<y V 0 (O,y,c)}) = ( 1 ) 

m 

Fix now a natural n. Then there exists a natural k such that for every m > 
gn{m) > (3/4)to (since lim„_>oo = 1). Let m > k and consider the formula 

02 := 3x[{P{y) > l/2)(x <\yV ©(0, x, y)) A {P{y) < l/2)(x <yV ©(0, x, y))] 

We claim that for m> n, A^ ^ 02 iff to is even. One direction goes as follows: 
If TO = 2s and gn{'m) > (3/4)to then m — gn{m) < |s. Taking c as s — m+gn(m), 
we have by equation ( 1 ) that 

7Tl — S 1 

Mm({y <m:A^44 1= (c <1 J/ V ©(0, y, c)}) = = - 

m 2 

For the other direction, suppose that there exists a d < m such that 

fimHy <m:A^^ \=d<iy\/ ©( 0 , y, c^)}) = ^ 

From the fact that A-^ is 5 „-modular we obtain that there exists an a < g„(TO) 
such that ^ 

t^m({y < TO : A^ \= a<lyV ©(0, y, a)}) = - 

which implies that 

Mm({y <m: A^ \=^{a<\y) A -■(©(0, y, a))}) 

= Mm({y < w : -4m h 0 < a}) = ^ (2) 

Note now that a cannot be < to — g-niDi) because, if this was the case then from 
g„-modularity we have that 
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Thus m — gn{Tn) < a < gn(m). We can apply now equation (1) to obtain 
that 

1 / r .n: ^ <7n(tu) ~ « 

2 = Mm({y <m:A^ j=a<]gVg = a})= — 

Hence | = (g„(m) — a)lm^ that is, gnini) — a = m/2, so m must be even. 

In a similar way, one can prove that for every natural d > 2, there exists a 
formula 6^ in FO + {P{z) > l/d, P{z) > {d — l)/(i}({©, ©, 0, 0, 1}) such that 
for every natural n, for every arithmetic structure Am with m > n, A^ ^ 
0d iff m is a multiple of d. 

A consequence of the above example is that the boolean query “the size of the 
model is divisible by d”, for d > 1, is expressible in {FO + {P{z) > l/d,P{z) > 

(d — l)/d})MOD- 



4 Separation Results for Modular Logics 

In this section we prove separation results between fragments of CP mod defined 
in Definition 2. Since a formula such as -'((P(z) > e)Lp) is equivalent to {P{z) > 
1 — (and ~^{{P{z) > e)(p) is equivalent to {P{z) > 1 — e)-K/?) we can push all 
negation symbols inside and together with all well known ways of manipulating 
quantifiers in a formula, we get the following prenex normal form for formulas 
in CP. 

Theorem 1. For every formula 4>{x) € £P(F~^)[ri,r 2 , ■ ■ ■ there exists a 
quantifier free formula 9{yi, . ■ ■ ,Vw,x) G >CP(T)[ri, r 2 , . . . , r^] such that for 
every structure Bm, for every vector of naturals d < m, 



Bm ^ f^{of G-)- QiyiQ2y2 • • • Qwyw9{y ^ of 

where each quantifier Qs is either 3 or \/ or {P{z) > qijlrf) or {P{z) > qij/rf), 
for some t G {1, . . . , fc} and some 0 < qtj < ri. □ 

We proceed now to define the notion of an T’-chain of models and the stronger 
notion of a chain. 

Definition 7. Fix a suhlinear function F. An F-chain of models C is a collec- 
tion of finite structures for F+ = F U U {cr}*=i with the following 

property: 

— For every relation symbol R{x) o/F+, for every two models Bm,Bn in C with 
m < n and F{m) = F{n), and for every d < F{m), Bm H F(d) iff B„ ^ 
R{d). 

A chain of models C is a collection of finite structures for F~^ with the 
following property: 

— For every relation symbol R(x) of F~^ , for every two models Bm,Bn in C 
with m < n and for every d < m, Bm |= R{d) iff Bn ^ R{d). 
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In other words, chains are collections of models with inter-compatibility for 
its predicates. 

Remark 2. If C is a chain of arithmetic models then, for every sublinear function 
F, = {A^ : Am G C} is an F-chain. 



Example 5. Let {Am}m=i be the collection of arithmetic models for F = {©, 
©, <1, 0, 1}. It is easy to check that this collection is a chain. 

We are ready to obtain separation results for the expressive power of the 
different modular logics. Our main tool is the following lemma which establish 
conditions for elementary equivalence. It states that for every sentence (j) in 
£F(F+), models that are in the same chain and have almost the same size can 
not distinguish cj). 

Lemma 3. Let F be a sublinear function and C an F-chain of models. Let ri, 
f 2 , ■ ■ ■ , fk be distinct non zero natural numbers. Let 4>{xi , . . . , Xg) be any formula 
in £F(F+)[ri, T 2 , . . . , r^]. Then one of the following two possibilities hold: 

1. For every two F -modular models Bm and Bm-t-i in C such that m 1 > rt 
and m =n —1, for every i < k and F{m) = F{m-\-l), we have that, for every 
ai,...,Qs<m, Bm\= 4>{ai, . . . ,as) implies Bm+i \= 4>{ai, . . . ,as) , or 

2. For every two F -modular models Bm and Bm+i in C such that m -\- 1 > ri 
and m =n —1, for every i < k and F{m) = F{m -\- 1), we have that, for 
every ai, ..., as < m, Bm+i \= 4>i.ai, . . . ,ag) implies Bm\= 4>{.ai, . . . ,ag). 

Proof. We proceed by induction on the quantifier rank of 4>. 

Quantifier Free Formulas: By definition of F-chain, if 4>{xi , . . . , Xs) is quan- 
tifier free and Oi, . . . , Ug < F{m), we have that 

Bm'^ <t>{ai,. . . ,ag) if and only if Bm+i (j){ai, . . . ,ag), 

We prove that this equivalence holds for a\, . . . , Ug < m. For each coordinate 
Oj such that F{m) < Ui < m, pick bi < F{m) such that bi =F(m) a^, 
and otherwise take bi = a^. Since Bm and Bm-ki are F-modular, Bk H 
(p{ai, . . . ,ag) 4=^ ^ 0(6i, ..., &s) for A: = m, m -I- 1. From this it follows 

the desired equivalence for ai, . . . , < m and F{m) = F{m -\- 1). 

Existentially or Universally Quantified Formulas: These two cases are 
not difficult to prove and we omit the proofs for lack of space. (Hint: the 
direction from Bm+i to Bm use that Bm+i is F-modular.) 

Probability Quantifiers: We assume that case 1. holds, that is, for F, m, ri, 

. . . , Tfc as in the hypothesis and for every a\, . . . , Og < m and every b < m: 

Bm\= <f{ai,...,ag,b) implies Bm-ki \= <f{ai,- ■ ■ ,ag,b). 



We have two cases to consider under these hypothesis. 
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We consider first the formula, {P{z) > qij/ri)<l){a,z). Fix an arbitrary m 
satisfying that F{m) = F{m + 1), m + 1 > Vi and m =r^ —1 for every i < k, 
&X Qi, . . . , Us < m. Let t be a natural number such that m = tri + Vi — 1. 
Now, if Bm 1= {P{z) > Qij /ri)(j){a, z) and since gcd{ri,m) = 1, then 



\{z <m-.Bm\= <l>{a,z)}\ > 



qijjtrj + rj - 1) 
Ti 



dijit + 1 ) - ^ 

' i 



and since qij < ri, we obtain that \{z < m : Bm |= ^)}| > Qij{t + 1). By 

induction hypothesis we get that 



\{z <m+l : Bm+i h 4'{a, z)}\ > qij{t + 1) 



^(t + l)(r,) = ^(m+1), 

Ti Ti 



which implies that g{{z < m + 1 : Bm+i H ^ that is 

Bm+i 1= (P(z) > qij/ri)(j){a,z), which is the desired result. 



Next we consider the formula {P{z) > qij/ri)4>(d, z) and we shall prove that 
case 2. holds for this formula. Fix an arbitrary m satisfying that F{m) = 
F(m+1), m + 1 > Ti and m —1 for every i < A:, fix oi, ■ ■ . , Gg <m. Let t 
be a natural number such that m = tri+ri—1. If Bm H (^(-^) ^ z) 

and since gcd{ri,m) = 1, then 



\{z <m: Bm\= 4>i.a^z)}\ < 



qijjtrj + n-l) 
Tj 



dijjt + 1 ) - — 



and since qjj < r^, we obtain that 



\{z <m : Bm\= (t){a,z)}\ < qjj{t+ 1). 



By induction hypothesis we get that 

|{z < m + 1 : Bm+i h ^jd, z)}\ < qijjt + 1) = —jt + l)(ri) = ^(m + 1). 

i 

which implies that gj{z < m+1 : Bm+i |= 4'(d,z)}) < qijjrj, that is, 
Bm+i 1= jPjz) < qij/rj)(j){a,z), which give us case 2. for this formula. 

The proofs for both type of probability quantifiers under the assumption 
that case 2. holds for (j) are just the counterpositive versions of the two cases 
just proved. □ 



The above lemma can be used to prove separation of different fragments of 

BP MOD- 

Theorem 2. Let r,ri,r 2 , ■ ■ ■ ,rk be distinct non zero natural numbers, and such 
that r is relatively prime with each ri,.. . ,rk- Then CP Moojri, ■ ■ ■ ,rk] is pro- 
perly contained in CP Moojri ■ ■ - rk,r]. 
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Proof. It is obvious that CPmodIti, ■ ■ ■ ,rk] is contained in CPMOolri, •••, 
Tk,r]. Furthermore, we saw (Example 4) that the query: “the size of the model 
is a multiple of r” is expressible in £Pmod(^)H- We will show that this query 
is not expressible in CV[ri . . . ,rk]MOD{P)- More specifically, we will show that 
there is no sentence (j) in CVg„ [ri . . . , rk\{P) that defines the above query, where 
gn is the sublinear function defined in Example 3, for all n > 

Recall that the collection of all arithmetic models C = {Am}m=i forms 
a chain. It follows that for every n, the collection C®" = forms a 

(?„-chain. Suppose now that there exists a sentence 4> in CPg^ [ri . . . , r^,] (F) that 
captures the query “the size of the model is a multiple of r” for all (except finitely 
many) structures . Then we can apply Lemma 3 and get the following: 

For every two models Af^ and in C®" such that m + 1 > r^, 

m =n —1 for every i, and gn{iTi) = gn{m+ 1), we have that at least one 

of the following two cases hold 

(1) h 0 implies h <t>, or (2) ^ (j) implies h </*• 

Suppose it is case 1. that is true. Using that r is relatively prime with the rfs 
together with the Generalized Chinese Remainder Theorem we can get a natural 
number b < (0^=1 such that b =r^ —1 for every i and b =r 0. Let D be the 
collection of naturals m such that m = r(Y^^i ri)tn + b for some natural t > 0. 
Clearly m + 1 > r^, m =r^ —1 for every i, and 5 n(m) = (/„(m+ 1). Furthermore, 
D is infinite and for every m G D, m =r 0. It follows that for almost all the 
m G D, A^ ^ (j) and, in consequence, for almost all the m G D, A^^i ^ </>, i.e. 
for almost all elements m of F, m + 1 is a multiple of r, which is impossible. 

Suppose it is case 2. that is true. Then by a similar argument as above we 
prove the existence of & < (OiLi f)?' such that b =n —1 for every i and b =r — 1. 
Let D be the same as above. Then D is infinite and for every m G D, m =r — 1. 
It follows that for almost all the m G D, ^ (j) and, in consequence, for 

almost all the m G D, A^ |= (j>, i.e. for almost all elements m of D, m is a 
multiple of r, which is impossible. 

We conclude that such sentence (j) can not exists in CVg„ [ri . . . , rfc](F). □ 

Corollary 2. The expressive power of FOmod is strictly weaker than the ex- 
pressive power o/£Fmod[2]. □ 

This last result, for modular logics, corresponds to the separation of FO and 
FO + M in the context of arithmetic models, which in turn is equivalent to the 
separation of AC° from TC° shown by Ajtai and independently by Furst, Saxe 
and Sipser (see [5] for a nice exposition of this result and references) . 

5 Approximating CVa with CVmod 

We introduce the notion of approximate formulas. This concept will provide a 
link between satisfaction in arithmetic structures and satisfaction in modular 
approximations of these arithmetic structures. 




552 



A. Arratia and C.E. Ortiz 



Definition 8 (Approximate Formulas). For every formula in prenex normal 
form 9{x) € £P(F+), for every Q < e < 1, we define the e- approximation of 6{fc) 
as follows: 

Atomic formulas. If 0{x) := Rs{x,c) then 9^{x) := Rs{x,c). 

Negation of atomic formulas. If 9{x) := ~'Rs{x,c) then 9^{x) := ~'Rs{x,c). 
Conjunction. If 9{x) := (j){x) A tp{x) then 9e{x) := (j)e{x) A tpe{x). 
Disjunction. If 9{x) := <j){x) V tp{x) then 9^{x) := 4>e{x) V tpe{x)- 
Existential quantification. If 9{x) := 3z(f>{x,z) then 9^{x) := 3z(j)e{x, z). 
Universal quantification. If 9{x) := \/z(f>{x, z) then 
9e(x) := (F(z) > 1 - e)(j>e{x, z). 

Probability quantifiers. If 9{x) := {P{z) > r)(j>{x, z) then 
9^{x) := {P{z) > r — min(e, r))(/)e(s, z). 

If 9{x) := {P{z) > r)4>{x, z) then 9^{x) := {P{z) > r — min(e, r))(f^{x, z). 

The next lemma provides the basic operational properties of the approximate 
formulas. 

Lemma 4. For every formula (in prenex normal form) 9(x) G CP{F^), for 
every 0 < e < 1, for every finite structure Bm and every vector d < m the 
following holds: 

— IfO<e<S<l then Bm H (^e(d) — >■ 9s(a). 

— If {ci}iZi is a sequence of real numbers less than 1 and converging to 0 , then 

If (Vi G N, Bm\= Oeiid)) then Bm h ^(a)- 

The purpose of the next theorem is to establish an “approximation” rela- 
tionship between satisfaction in the modular logic CP mod and satisfaction in 
CP A via the approximate formulas. 

Theorem 3. (Bridge Theorem^. Fix a natural n. For every formula in prenex 
normal form 9(x) G CP{F^), for every arithmetic model Am with m > , for 

every a < gn{m), the following holds: Af^\= 9(d) implies Am\=9i/n(d). 

Proof. By induction in the complexity of the formula. 

Atomic formulas and negation of atomic formulas. (Hint: for atomic 

formulas and their negation 9i/n is the same as 0.) 

Conjunction, disjunction. Direct. 

Existential quantifier. (Hint: Suppose ^ 3z9(d,z) . Then use Lemma 1 
and that Af^ is (/„-modular to conclude 9(x, z) is (/„-modular for and, 
hence, A(^ ^ 9(d, [c]g^(m)) for some c < m.) 

Universal quantifier. Suppose that satisfies the formula Vz0(a, z). Then 
for every c < 5 „(m) we have that A^ ^ 9(d,c). We can apply now the 
induction hypothesis to obtain that for every c < g„ (m) we have that Am H 
9i/n(d, c). Since ^ m > we get that > 1 — which 

implies Am h (P(z) > 1 - ^)6»i/„(a, c). 




Approximating the Expressive Power of Logics in Finite Models 553 



Probability quantification. Suppose that satisfies the formula {P{z) > 
r)9{a, z) for 0 < r < 1. It follows that |{c < m : A^ 1= 9{a, c)}| > rm. Then 
we get that 

|{c < g„{m) : A^ 1= 6<(d,c)}| > rm - {m - 5 „(m)). 

Applying the induction hypothesis we obtain that 



\{c<m: Am\= 9i/n{a, c)}| > rm - {m- 5 „(m)). 



It follows that 

Urn ({c < m : 1= 6»i/„(a,c)}) > 

{m-gn{m)) 1 , 2 

r = r since m > n . 

m n 



rm — (to — gn{m)) 



But this last statement is just Am h — ^)0i/„(a, z). □ 

The gist of the above result is to give a quantifiable relationship between sa- 
tisfaction of a formula in the structures A!^ and satisfaction of its approximation 
in Am- It implies the following relationship between boolean queries captured 
by CP A and the boolean queries captured in CP mod- (We will abbreviate by 
(-'9)e, for 9 £ £P(r+), the e-approximation of the formula equivalent to -•9.) 

Corollary 3. Assume there is a boolean query B, a natural n and a formula 
9 G CP{r'^) such that for every arithmetic model Am, with m > , if Am H 

0i/„ then Am G B, and if Am |= then Am ^ B. Then for every m > , 

Am£B ijJA<^ ^9. D 



6 P and the Logic LP Extended 

The first problem shown to be complete for the class P, deterministic polynomial 
time, was Path System Accessibility due to Cook [2]. An instance of the Path 
System Accessibility problem, which we abbreviate from now on as PS, is a finite 
structure A = (A, R,T, s), or a path system, where the universe A consists of, 
say, n vertices, a relation R C A x A x A (the rules of the system), a source 
s G A, and a set of targets T C A such that s ^ T. A positive instance of 
PS is a path system A where some target in T is accessible from the source 
s, where a vertex v is accessible if it is the source s or if R(x, y, v) holds for 
some accessible vertices x and y, possibly equal. In [8], Stewart shows that PS is 
complete for P via quantifier free first order reductions; in fact, via projections 
(see [8] for definitions and also [5] Section 11.2), and we will use that result to 
show that an approximation version of PS which we present in Example 6 below 
is also complete for P via reductions that are projections, and that would help 
us to show that a certain extension of our CP logic captures P on finite ordered 
structures. (We remark that Stewart considers the path systems in [8] as having 
only one target, and not a set of targets as we do here. However one can see 
that his results on completeness of PS via first order reductions holds also for 
our version of this problem.) 
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Definition 9. Let X he a second order variable of arity 1, and a{x,X) a first 
order formula over some (finite) vocabulary t with first order variables x = 
(a;i, . . . ,Xm) and second order variable X. Let r G [0, 1]. Then 

{P{X) > r)a(x,X) and (P{X) > r)a(x,X) 

are new formulas with the following semantic. For an appropriate finite r-model 
An, and elements d = (ai, . . . , Um) from {0, . . . , n — 1}, the universe of An, 

An h {P{X)>r)a{d,X) 

4=^ the least subset A C {0 , . . . ,n — 1} such that 
An h a{a,A) has \A\/n>r 

Similarly for {P{X) > r)a(d,X). 

Example 6. Let r = {R, T, s} where i? is a ternary relation symbol, T is a unary 
relation symbol and s is a constant symbol. We think of r-structures as path 
systems with source s, a target set T and set of rules R. Let r be a rational with 
0 < r < 1. We define 

NPS>r := {A = {A, R, T,s) : ^ is a path system and at least a fraction r of 
the elements accessible from s are not in T} 

Let anps{X) be the following formula (the constant symbol _L stands for false), 

anps{X) :=\/x{x = s — )> X{x)) 

t\ ^x'iy'iz{X{x) t\ X{y) t\ R{x,y,z) — >■ X{z)) 

A yx{X{x) A T{x) — > _L) 

Then 

An G NPS>, ^ An ^ {P{X) > r)anps(X) 

NPS>r is an approximation version of the problem PS, definable by our 
probability quantifiers over unary second order variables acting on formulas with 
a particular form to which we give a name below. 

Definition 10. Let r = {i?i, . . . , Rm, Ci, ■ ■ ■ ,Ck} be some vocabulary with re- 
lation symbols R\, . . . , Rm, and constant symbols C\, . . . , Ck, and let X he a 
unary second order variable. A first order formula a over r U {X}, and extra 
symbols as = (equality) and the constant _L (standing for false), is a universal 
Horn formula, if a is the conjunction of universally quantified formulas over 
T U {X} of the form 

V'l A V'2 A . . . A V's — T 

where (p is either X(u) or _L, and ipi, . . . , f/'s o,re atomic (r U {X})-formulas 
with any occurrence of the variable X being positive (there are no restrictions 
on the predicates in r or =). 

The logic CP Horn is the set of formulas 

FO {(P(A') > r)ai(x, X), (P{X) > r)a 2 {x, X) : ai(x,X) is universal 
Horn (first order) formula with second order variable X} 
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Example 6 shows that the problem NPS>r is definable in CP Horn- We shall 
see that this is true of all problems in P 

Lemma 5. The set of finite structures that satisfy a sentence 9 in CP Horn is 

in P. 

Proof. Let 9 G CP Horn be of the form 

m 

(P(X) > r)[/\ A . . . A V'is — Pi)]-, 

i=l 

and let An be a model of the appropriate vocabulary of size n. Then it’s not dif- 
ficult to describe a polynomial time procedure that decides whether An satisfies 
the above sentence. □ 

Thus, according to this lemma, our problem NPS>i. is in P. We show next 
that it is hard for P. 

Lemma 6. The problem NPS>^ is complete for P via projections. 

Proof. We exhibit a (successor free) projection from the complement of the 

problem PS to NPS>i.. Let A = {A,R,T,s) be an instance of PS. Define 

a! = (A', R' ,T' , s') as follows: its universe A' = A^, and 

T' = T X s = {{x, s) : X £ T} 

R' = {((x, s), {y, s), (z, s)) : (a;, y, z) G i?} U 

{((a;, s), (y, s), (z, s)) : a; G T A a; yf s A y G T A y yf s A z yf s} 

s' = (s,s) 

Then, ^ G PS A! ^ NPS>,.. □ 



Corollary 4. Every problem inP is a set of finite ordered structures that satisfy 
a sentence in CP Horn 

Proof. Every problem in P is reducible to NPS>r via projections; NPS>r is 
definable in CP Horn and this logic is closed via projections. □ 



Corollary 5. Over finite ordered structures, the logic CP Horn captures P. □ 

The logic CP Horn Verifies Lemma 1; namely, for a sublinear function F, 
E-modularity is preserved. Indeed, we need only to check for formulas of the 
form (P{X) > r)a(z,X): Suppose d,b < m, d =F{m) b and Bm h (-P(-^) > 
r)a{d,X). Then there exists a, B C {0, 1, . . . , m — 1}, such that B^ |= a{d,B) 
and \B\ > rm. The parameters in a do not occur in B; hence, by inductive 
hypothesis Bm \= a(b,B). Thus, Bm \= {P{X) > r)a(b,X). □ 
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Abstract. We introduce a new model of “generic discrete log algo- 
rithms” based on arithmetic circuits. It is conceptually simpler than 
previous ones, is actually applicable to the natural representations of 
the popular groups, and we can derive upper and lower bounds that 
differ only by a constant factor, namely 10. 

Keywords. Discrete logarithm, generic algorithm, arithmetic circuit, 
cyclic group 



1 Introduction 

Discrete logarithm computations and their presumed difficulty are a central topic 
in cryptography. Let G be a finite cyclic group of order d, p the largest prime 
divisor of d, and n the bit length of d (that is, n is the “private key length”). 
There are three types of results: 

— “Generic” algorithms such as baby-step giant-step, Pollard rho, and Pohlig- 
Hellman. Together they provide a solution with 0{riydp + v?) group opera- 
tions. 

— Algorithms for special groups, such as the index calculus for the group of 
units in a finite field, and Weil descent for special elliptic curves. 

— Lower bounds f2{,yp) on “generic” algorithms. 

This paper proposes a new solution to the last point. 

Babai & Szemeredi (1984) first proposed a model in which even a lower 
bound G(p) holds. Then Nechaev (1994) suggested a deterministic model with 
an n{y/p) bound, and Boneh & Lipton (1996) considered finite fields. The most 
popular model was invented by Shoup (1997). It is probabilistic, has an U(0i) 
lower bound, and also works for the Diffie-Hellman problem. Maurer & Wolf 
(1998, 1999) continued to work on this, in particular by relating the two questions 
of discrete logarithms and the Diffie-Hellman task. See also Schnorr & Jakobsson 
(2000) and Schnorr (2001). 

An essential ingredient of Shoup’s method is a bit representation of the group 
elements, and his lower bound holds for a random description of this form. The 
standard “generic” algorithms consist of two phases: first some group calcula- 
tions are performed, and in a second phase the resulting lists of group elements 
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are sorted, with the goal of finding a collision. Of course, when one wants to im- 
plement such an algorithm, one has to use some bit representation of the group 
elements in computer memory. But the algorithms will use one “natural” repre- 
sentation, not random ones. Strictly speaking, Shoup’s result does not apply to 
this situation, and thus does not provide a lower bound in the natural setting. 

This paper repairs this state of affairs by presenting a new model for “generic” 
discrete log computations which is both technically simpler and more powerful. 
It has the following properties: 

— the known “generic” algorithms fit in, 

— a lower bound of holds, 

— it does not make assumptions about the representation of groups, 

— there is a matching upper bound, larger only by a constant factor. 

This is basically achieved by ignoring the second phase, where sorting occurs. 
Then one can do away with the group representation, and describe the first phase 
in a simple arithmetic model. 

It is important to note that the goal here is not a way of describing useful 
discrete log computations. In fact, our computations do not calculate discrete 
logs, but any “generic” discrete log computation yields one of our type. The 
asymptotically matching upper and lower bounds are an indication that this 
may be the “right” level of abstraction. 

The most natural way of saying that we “only want to use group opera- 
tions” is by using arithmetic circuits (a.k.a. straight-line programs) with group 
operations. This model was introduced in great generality by Strassen (1972). 
However, a circuit computes only group elements and not discrete logs, which 
live in the “exponent group” . Success in the usual algorithms is signalled by a 
collision, where the same group element is calculated in two different ways. The 
basic idea is to declare a circuit as successful if it produces such a collision. One 
has to be a bit careful: it is easy to produce trivial collisions, say by calculating 
the group element 1 in two different ways. This leads to our notion of a collision 
“respecting” a divisor q of the group order: it is not trivial in the “exponent 
group modulo g”. 

In Section 2, we set up the required notions. Section 3 starts with the usual 
“nonzero preservation” result modulo a prime power; it is somewhat simplified 
in comparison with other generic models by considering only linear polynomials. 
Technically, this Lemma 7 is the main overlap with Shoup’s method. Then we 
prove the main result, a lower bound of in Theorem 8. The model is suf- 

ficiently powerful (or weak, as you have it) that essentially matching upper and 
lower bounds hold; they differ only by a constant factor, namely 10 (Corollary 
10 ). 

The model so far is deterministic; Section 4 extends it to probabilistic com- 
putations. The same lower bound holds. This is no surprise, since randomized 
algorithms such as Pollard’s rho method do not reduce the computing time. 
This method is important because it reduces the required memory to a constant 
number of group elements, but we do not consider this resource. 
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2 Arithmetic Circuits for Discrete Logarithms 

We fix the following notation: 

G = {g) is a finite cyclic group, d = #G, (1) 

p is the largest prime divisor of d, and n is the binary length of d. 

We consider algorithms that use only the group operations, starting with 
three special group elements: 1, the generator g, and x. From these we build 
further group elements by multiplication and inversion. 

Example 2. Here is a formulation of the baby-step giant-step algorithm for 
d=20: 



instruction 



trace 



trace exponent 



y -2 < — 1 
y-i < — 9 
yo < — X 
yi < — yo- y-i 
y 2 < — 2/1 ■ 2/-1 

2/3 ^ — 2/2 • 2/-1 
2/4 < — 2/3 • 2/-1 
2/5 < — 2/4 • 2/-1 
y &< — 2/5 • 2 /( 7 ^ 

yr < — 2/6 • 2/6 

ys < — 2/7 • 2/6 
2/9 < — 2/8 • 2/6 
2/10 < — 2/9 • 2/6 



1 


0 


9 


1 


X 


t 


xg 


t 1 


xg"^ 


t 2 


xg^ 


t -f 3 


xg* 


t + 4 


xg^ 


t + 5 


9^ 


5 


9 I 0 


10 


15 


15 


to 

0 


20 


92 , 


25 



The “trace” gives the group element computed in each step. The “trace 
exponent” is explained below. The algorithm is in its simplest form, ignoring 
shortcuts like g^'^ = 1 . 

If logg X = 5b + c, with 0 < 5, c < 5, then x = hence xg^~^ = ^^(b-i-i)^ 

and both elements appear in the computation. If we take G = Z 25 = (2), 
a group of order 20, and a; = 19 = 2^®, then we have 18 = 5 • 3 -I- 3 and 

2 2fl 

2/2 = xg^ = g^^ = yg. 



How do we express that the algorithm successfully computes log^ xl We are 
very generous: we say that the algorithm is successful if a “collision” u = v 
occurs for two previously computed results u and v for which “u = 1 ; is not 
trivial”. If we computed yi = j/_i • yZl, 2/2 = 2 /o • 2/o"^> then yi = yg would be 
trivial. We will make this precise in a minute. 

The type of computation shown in the table above could be called an “arith- 
metic group circuit with inputs 1, g, and x”. We abbreviate the assignment 
2 /fc ^ — yi ' yf^ as (i,j,±l), and also trace the exponents of g and x in the 
circuit. Then we arrive at the following notion. 
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Definition 3. (i) An arithmetic circuit is a finite sequence C = (/i, ... ,Ii) 
of instructions Ik = (i,j,e), with —2 < i,j < k and e G {1,-1}. The size of C 
is £. Note that C is not connected to any particular group. 

(ii) If C = {I\, . . . ,Ii) is an arithmetic circuit, G a group and g,x € G, then 
the trace of C on input {g,x) is the following sequence Z- 2 , Z-\, . . . ,zi of 
elements Zk ofG: 

z _2 = = g,Zo = x,Zk = Zi- Zj for k>l and Ik = {i,j,s). 

(Hi) For an arithmetic circuit C = {I\, . . . ,Ii), the trace exponents consist of 
the following sequence r_ 2 , t_i, ... ,Ti of linear polynomials Tk in Z[t] : 

T-2 = 0, T_1 = l,To = t,Tk = Ti + e ■ Tj for k > I and Ik = (i,j, s). 

We think of g as fixed, and also write Zk{x) for the trace elements Zk in (ii). 

The connection between the trace and the trace exponents is clear: ii x = g°‘ 
and Tk = c ■ t + b, then 



Zfc(x) = gV = g" • 

Recall that in the exponents, we may calculate modulo the group order d, once 
we consider a fixed group. 

Example 4. Here are two more examples of trivial collisions. 

(i) We take g,x = g°' in a, group of order d, and an arithmetic circuit which 
computes ym = g‘^ with an addition chain of some length m, and also y 2 m = 
x'^. Then Tm = d and T2m = dt, ym = g’^ = i = x'^ = y2m, and we take 
the congruence Tm — T 2 m = 0 mod d as an indicator for the triviality of this 
collision. 

(ii) Now let q be an arbitrary prime divisor of d, maybe a small one, and assume 

that d^ q. Again we calculate some ym = 5'^^* and y 2 m = x‘^^'^. Now both 
results lie in the subgroup H = of order q, and we can find a collision 

with a further q (or even 0{^Jq)) steps. But we have only calculated a 
discrete logarithm in H, not in G. If, say, q = 2, then ym = yf 1 
and j/ 2 m is either ym or 1. Thus we have a collision, either y ~2 = y 2 m or 
Vm — y2m- ^ 

How do we express that “u = v is trivial”? We certainly want to say that 
“the collision yi = yj is trivial” if Ti = Tj, or even if Ti = Tj mod d, but this is 
not quite enough. We have to rule out unpleasant cases like the one at the end 
of Example 4, where a collision occurs but the discrete logarithm is not really 
computed. 

Definition 5. Let C he an arithmetic circuit of size £, G = (g), q an arbitrary 
divisor of the group order d = ffG, and i,j < £. 

(i) Then (i,j) is said to respect q if and only if Ti — Tj ^ 0 mod q. 
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(a) If on input some g,x € G, a collision yi = yj occurs, then this collision 
respects q if and only if (i,j) respects q. 

Thus we have the linear polynomial r* — Tj G Z[t] which is nonzero modulo 
q, hence modulo d, and if a collision occurs for x = g°', then = Zi{x) = 

Zj{x) = so that (t^ — Tj)(a) = 0 mod d. 

If (?i I 92 I d, and (i,j) respects 91 , then it also respects 92 . 

Example 4 continued, (ii) For 9 = 2, we have = d/2, T 2 m = dt/2, and 
Tm — T 2 m = d/2 ■ (1 — t) mod d. We assume that d is not a power of 2, and take 
a prime divisor 9 yf 2 of d. Then 9 divides d/2, and Tm — T 2 m = 0 mod 9 . Thus 
(m, 2m) does not respect 9 , and if on some input x from some group G, the 
collision g'^/'^ = Zm{x) = Z 2 m{x) = occurs, then this does not respect 9 , 
either. O 

Definition 6. Let G = (g) be a finite cyclic group, C an arithmethic circuit, and 
9 an arbitrary divisor of the group order d = f/G. Then the success rate ac,q 
of C over G respecting 9 is the fraction of group elements for which a collision 
respecting q occurs: 

<xc,q = ■ ff{x G G : on input x, a collision respecting 9 occurs in C}. 

Thus 0 < ac,q < 1, and a circuit, for which a collision respecting 9 occurs 
for every input x, has ac,q = 1- If 91 | 92 | d, then <Jc,q„ < '^C,q 2 - Example 2 
indicates that the baby-step giant-step algorithm gives a circuit of size 0{-\fd), 
where d = f/G and ac,d = 1- For simplicity, our notation does not reflect the 
dependence of the success rate on the group. 

Also, the Pohlig-Hellman algorithm is a generic algorithm. But index calculus 
in G = Fp is not generic; it makes essential use of the representation of the 
elements of G as integers less than p, and the ability to compute with these 
integers, say to check whether they factor over the factor base. 

3 The Deterministic Lower Bound 

“Nonzero preservation” is a generally useful tool. It says that the value of a 
nonzero polynomial at a random point is likely to be nonzero. It is well-known 
over integral domains; we need a slight generalization here. See Shoup (1997) for 
a more general version. 

Lemma 7. Let d > 2 be an integer, p® a prime power divisor of d, where p is a 
prime, and r = cit -I- cq G Z[t] a linear polynomial with r ^ 0 mod p®. Then 

#{a G Zd : r(a) = 0 mod p®} < d/p. 



Proof. Let z > 0 be the largest exponent with r = Omodp*. Thus i < e, 
and we can write t = p'‘ ■ (c(t -I- Cg), with Co,c^ G and at least one of 

them nonzero modulo p®“*. If c'l = 0 mod p, then there is no a G Z^ with 
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r(a) = 0 mod let alone modulo Otherwise there is exactly one ag G Zp 
with c\ag + Cq = Omodp, namely Oq = — Cg • mod p. The residue class 
mapping — > Zp maps any a G Z^; to a modp. Exactly d/p elements of 

are mapped to the same element of Zp. Now if p*(c'^a + Cg) = r(a) = 0 mod p®, 
then c'^o + Cg = 0 mod p, and hence a mod p = ag. There are exactly d/p such 
a, and the claim follows. □ 

Theorem 8. Let G = (g) be a finite cyclic group, q = p^ a prime power divisor 
of the group order d = f/G, C an arithmetic circuit over G of size £, and ac,q 
its success rate respecting q. Then 

£ > \/‘^crc,qP - 3. 

When ac^q is a positive constant, then £ G Q{,Jp). 

Proof. On some input x, a collision in C is of the form Pi(x) = yj{x) with 
—2<i<j<£. There are (£ + 2)(£ + 3)/2 such (i,j). Any (i,j) which respects q 
leads to a collision for at most d/p values of x, by Lemma 7, since the exponents 
a £ Zd correspond bijectively to the group elements x = p“. Thus the total 
number of possible collisions respecting q is at most {£ +2){£ + 3)/2 • d/p, and 
hence 

(Tc,q < (£+2)(^ + 3)/2p, 

{£+if>{£+2){£ + i)>2ac,qP. □ 

The various well-known algorithms yield an 0{n^fip + n^) upper bound for 
discrete logarithm computations, and we now have a lower bound l7(^/p) where 
p is the largest prime divisor of d. In what follows, we derive upper and lower 
bounds that differ only by a constant factor. We start with a lower bound differ- 
ent from Theorem 8, namely f2(n). This is not of direct cryptographic interest, 
since n « log 2 d is roughly the “key length” or “input length” — in contrast to ,/p 
which will usually be chosen so that it is exponentially large in n. The interest is 
a desire to understand the complexity of discrete logarithms as well as possible. 

Theorem 9. Let C be an arithmetic circuit of size £, G = (g) a cyclic group of 
order d > 3, with aq,d = 1; and let n = [log 2 dj -I- 1 be the binary length of d. 
Then 

and hence £ G Q{n). 

Proof. Any element a of Zd has exactly one balanced representative 6 G Z with 
a= {b mod d),—d/2 < b < d/2. 

For —2<k<£,we write the trace exponent Tk G Zd[t] as Tk = (cfc mod d) ■ t + 
{bk mod d), where Cfe, 6^ G Z are balanced representatives. By induction on k it 
follows that \bk\, \ck\ < 2* for 0 < fc < £ (and \bk\, |cfc| < 1 for k = —2, —1). Now 
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let oq = \ Vd\ , a = (ao mod d) € and x = g°' G G. The assumption ac,d = 1 
implies that there are i,j < £ with — mod d and (r^ — Tj){a) = 0 mod d. 

We let 

u= {ci — Cj) ■ ao + {bi — bj) G Z. 

The above implies that u = 0 mod d. 

If Cj = Cj, then bi = bj mod d and Tj — Tj =0 mod d, which is ruled out. 
Thus Ci Cj. u = 0, then 

v^ - 1 < |aol = + \b,\ < 2^+b 

I Ci Cj I 

If M ^ 0, then |m| > d, and 



2^^^{Vd + 1) = 2^~^^Vd+ 2^+^ > |cj — Cj|ao + \bi — bj\ 
> |(cj - Cj)ao + (bi -bj)\ = |m| > d, 

d 



0^+1 



> 



'/d + I 



> Vd — I. 



Thus £ > log(-\/d — I) — I in both cases. The claim now follows from 
log(Vd- 1) > ^logd-^ > ^LlogdJ - ^ ^ 1 



for d > 12. (One checks the cases 3 < d < 11 separately.) □ 

For an upper bound in our model, we just compute and and then 
perform a baby-step giant-step search in the subgroup of p elements. 

The total cost is 2(n -I- ^), and we have the lower bounds of n/2 and \/2p; 
approximately. Thus the gap is a factor of about 4 or \/2, depending on whether 
n or y/p is larger. We can obtain a specific estimate as follows. 



Corollary 10. Let G he a cyclic group with d elements, n = [log 2 dj -I- 1 the 
binary length of d, p the largest prime divisor of d, e the multiplicity of p in d, 

m = max{i/^ — 3, n/2 — 2}, 

and assume that m > 37. Then there exists an arithmetic circuit C with success 
rate crc,p« = 1 over G and size at most 10m. Any circuit C with ac,p‘ = 1 has 
size at least m. 

Proof. The last claim follows from Theorems 8 and 9. For C we take the circuit 
described above. Then ctc.p® = !> and its size £ is at most 2 • 2 log (d/p) -I- 2y/p. 
Thus 

£ < 4 log d -k 2y/^ < 8(n/2 - 2) + V2 ■ ( - 3) -k 17 -k SV2 

< (8 -k V 2 )m -k 17 -k 3V2 < 10m. □ 



In usual models of computation, upper bounds come from algorithms — the 
real thing — and lower bounds impose barriers on improving these. But here, 
the lower bound is the real thing, and the upper bound a barrier on deriving 
better ones. As stated before, the above circuit cannot claim to actually compute 
discrete logarithms in G. 
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4 Probabilistic Arithmetic Circuits 

We now have a model for discrete log computations with essentially matching 
upper and lower bounds. However, Pollard’s rho method works in any group, 
but does not fit into our model because it makes random choices. The method 
does not make progress over the baby-step giant-step method in terms of time 
(= size of circuit), but cuts the space dramatically down to a constant. Space 
is not accounted for in our model, but we now adapt it to allow probabilistic 
choice. Once the model is appropriately set up within our framework, it is easy 
to obtain the same lower bound as before. Thus random choices do not help, in 
this specific sense. 

We allow two types of random choices in our algorithms: random group ele- 
ments, and random exponents. For the first, we might allow a new instruction 

yk < — rand(G) 

which assigns an element of G to yk- On executing the circuit, this element 
is chosen uniformly at random, independent of other executions of the circuit. 
Actually, this feature is not used in any discrete log algorithm that we are aware 
of. For the corresponding trace exponent, we take new variables t\, . . . ,ts if s 
instructions rand(G) occur. Thus Tk = U if G, . . . ,G_i have been used so far. 
But actually this feature is not required, because the next one subsumes it. 

We also want to allow random exponents, that is, an element with random 
e and previously computed y. When y = g, this may be thought of as a random 
element of G with known discrete logarithm. So, as a new feature we allow our 
circuits to use a string 

b = {h,... ,w)G{o,iy 

of random bits, via assignments 




with i < k and 1 < j < r. The corresponding trace exponent is 



Tk — bj * Ti- 



Example 11. In a probabilistic version of Pollard’s rho method, the next ele- 
ment yk+i is calculated as one of yk ■ g, or yk ■ x, each with probability 1/3. 
This is easy to simulate, using two random bits b and c. If we set 






(12) 



then yk+i will take one of the three required values for (6, c) = (0, 0), (0, 1), (1, 0), 
respectively. We set the probability of (6, c) = (1, 1) to 0. The formula can be 
implemented with an arithmetic circuit of size 11. In another version of Pollard’s 
rho method, one divides the group into three parts and makes the three-fold 
choice according to where yk has landed. This does not fit into our model. O 
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Definition 13. (i) A probabilistic arithmetic circuit is a pair C = (Cr,u) 
consisting of a probability distribution u on {0, 1}’’ for some nonnegative integer 
r and an arithmetic circuit as in Definition 3 except that in addition the 
following type of assignment is allowed: 

Vk < — vt' (14) 



with —2 < i < k and 1 < j < r. 

(ii) The size I of C is the number of group operations performed. Operations of 
the type (14) are not counted. (Formally, we might give appropriate rational 
indices k to their results.) 

(Hi) IfbG {0, 1}’’ is provided, then we obtain a circuit C{b) of size I as follows. 
If Vk is given by (14), then we replace all references to yu by a reference to 
y -2 (= 1) if bj = 0, and by a reference to yt if bj = 1. This replacement is 
performed recursively starting at the beginning of the instruction list until no 
more references to an assignment of type (14) exist. The new instructions 
are denoted as yk{b) < — yi{b) • yj{bY . 

(iv) For a divisor q of d, the success rate of C with respect to q is 

crc,q = ■ ^C{b),q- 

befo.i}’’ 

Thus ac,q is the average success rate of the C{b) for random b. Recall that 

(Xc(b),q = d~^ ■ ff{x G G : there is a collision yi{b){x) = yj{b){x) respecting q}, 

and such a collision respects q if and only if Ti{b)—Tj(b) ^ 0 mod q, with the usual 
trace exponent n{b), Tj{b) G Z[t]. These are defined only when some b G {0, 1}’’ 
is fixed, not for C itself. 

Theorem 15. Let G = (g) be a finite cyclic group, p a prime divisor of the 
group order d = ffG, C a probabilistic arithmetic circuit of size i, and aq^p it 
success rate respecting p. Then 



I > a/2(Tc,pP- 3. 



When ac,p is a constant, then t G Q{yfp). 

Proof. The probabilistic circuit C = (Cr,u) and each (deterministic) circuit C{b) 
have size i. From the proof of Theorem 8, we have 



(^+ 3)2 

2p 



- ^C(b),p 



for each b G {0, 1}’’. Hence 
(£+3)2 (£+3)2 



2p 



2p 



^ u{b) > ^ u{b)ac{b),p = <JC,p- 

6e{o,i}'- befo.!}'- 



□ 
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Abstract. This paper presents a new mathematical model of AIMD 
(Additive Increase Multiplicative Decrease) TCP for general networks 
that we believe is better than those previously used when it is driven 
by bottleneck capacities. Extending the paper by Edmonds, Datta, and 
Dymond that solves the single bottleneck case, we view AIMD as a 
distributed scheduling algorithm and prove that with extra resources, it 
is competitive against the optimal global algorithm in minimizing the 
average flow time of the jobs. 

Keywords: AIMD, TCP, online competitive ratio, flow time, fairness, 
multi-bottleneck. 



1 Introduction 

AIMD (Additive Increase Multiplicative Decrease) is the core algorithmic com- 
ponent of TCP (Transport Control Protocol) for allocating bandwidth or trans- 
mission rate to the different jobs. In this algorithm, each job A increases his 
bandwidth linearly at a rate of 6bi^t/5t = a (typically a = 1) until he detects 
that one of the bottlenecks that his transmission passes through has reached 
capacity, at which point, he cuts his bandwidth by a multiplicative factor of j3 
(typically (3 =\). 

This simple algorithm is understood quite well when the network is restricted 
to a single bottleneck. [3] proves that even though each sender has no global 
knowledge of the state of the network, the allocation converges quite quickly to 
EQUI, which partitions the bandwidth equally between the active jobs. Though 
this is fair to all users, it does not perform well at minimizing the average 
flow/response/waiting time of the jobs, which is the standard measure both in 
the systems and the scheduling communities. In fact, [14] proves the competitive 
ratio of this online, non-clairvoyant scheduler can be as bad as when 

measured against the optimal all-powerful, all knowing, off-line scheduler, which 
in this case is Shortest Remaining Work First. When there is such a negative 
result, a typical way to prove that the scheduler does perform well is to give 
it some extra resources before comparing it to the optimal scheduler, [8]. (See 
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Section 4 for additional motivation.) [4] does this proving that EQUI is (2 + e)- 
speed 0(l+i)-competitive, meaning that when EQUI is given 2+e times as much 
bandwidth, it performs within a constant as well as the optimal. AIMD, however, 
is different from EQUI. Its allocations continually increase and decrease and it 
takes some time for it to reconverge after jobs arrive or depart. [6] proves that 
if AIMD is given a constant number of adjustment periods per job to converge 
than it is also 0(l)-speed 0(l)-competitive. 

The main purpose of this paper is to extend these results to the multi- 
bottleneck case. There is little work done in this area. It is much harder, be- 
cause it is not at all clear what either the steady state of AIMD, ’’EQUI”, or 
the optimal are. We help to answer each of these three questions. 

Surprisingly there has not previously been a model of how AIMD changes or 
to what it converges. Kelly in [12,10] does a good job, but the algorithm they 
consider is different. In their AIMD, the frequency at which a bottleneck drop 
packets, instructing its jobs to decrease their bandwidth changes as fixed function 
that depends only on the current total traffic through the bottleneck in question. 
In contrast, in the standard AIMD algorithm for TCP, a bottleneck instructs 
its jobs to back off only when it reaches its capacity. The frequency at which 
this occurs is a much more complex function of what the other bottlenecks are 
doing. In Section 2, we define a new continuous model of how AIMD evolves on 
a general network within this setting and also define the scheduler, AIMDEQUI, 
to be that to which it converges. 

Because different jobs pass through different bottlenecks, the notion of the 
fairness of bandwidth allocation is not well define. Section 3 considers three 
notions of fairness. According to a socialist view of fairness, [7] prove that AIMD 
can be unfair by a factor of to, where to is a bound on the number of bottlenecks 
that a job goes through. We show that according to a local view of fairness, it is 
never more than a factor of to unfair and that according to a free market view, 
it is perfectly fair. 

Finally, Section 4 proves that AIMDEQUI is 0(TO^)-speed 0(TO)-competitive, 
meaning that with 0{m^) times the bandwidth, the flow time under AIMDEQUI 
is within a factor of 0{m) of that of the optimal all knowing scheduler. We believe 
that it is not unreasonable to assume that to is a constant because within the 
actual internet no transmission hops more than a half dozen times. We are also 
able to prove that AIMDEQUI is 0(I)-speed 0(I)-competitive independent of 
TO. However, this result requires the assumption that the adjustment frequencies 
of the bottlenecks do not change much within the life of an individual job. This 
we believe is a reasonable assumption because the adjustment frequencies are 
a global property that should not be greatly effected by the arrival and the 
completion of individual jobs. We believe that the result is true without this 
assumption or minimally when given speed s = 0{m), however, as of yet this 
has been unattainable. 
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2 The Continuous AIMD Model for General Networks 

In this section, we propose two new models of AIMD through a general network. 
The first model is a set of differential equations similar to those given by Kelly 
in [12,10]. We argue, however, that ours is a better model of AIMD when it is 
driven by bottleneck capacities. Unlike Kelly, however, we are unable to prove 
that the system converges, though we have strong arguments that it does. To 
avoid this problem, we will simply define another model, denoted AIMDEQUI, 
which is the previous model at its steady state. It is this second model that we 
prove is competitive against the optimal bandwidth scheduling algorithm. We 
use the following notation: 

— is set of routers that act as bottlenecks, the of which has maximum 
bandwidth B^. When the scheduler has “speed” s, this maximum bandwidth 
is increased to s-Bk- 

— J = {Ji} the set of jobs (or sessions). Each job Ji is defined by its arrival 
time Oi, its file length li, and as done in [12,10], the subset of the bottlenecks 
B{i) that it passes through. Conversely let Jt{k) denote the set of jobs Ji 
that pass through the bottleneck and are active at time t. Note that as 
a simplifying assumption, we are ignoring the path that a job takes through 
these bottlenecks and any delays caused by transmission times. In particular, 
we are ignoring the fact that different jobs may have different transmission 
times. 

— We denote by bi^t the bandwidth or transmission rate used by job Ji at time 

t. The restriction for the k*^ bottleneck is that '^i^j < sBk- 

— We denote by Ci the time that the transmission oi job Ji is completed. 
To accomplish this, the algorithm must allocate enough bandwidth so that 

X:e[ai,Ci] ^i- 

— We measure the quality of a scheduling algorithm using the average 
flow/response/waiting time of the jobs, i.e. Avgjgj[ci — a,]. 

— a is the additive increase and (3 the multiplicative decrease parameter set 
by the AIMD algorithm. Namely, each user increases his transmission rate 
linearly at a constant rate of Sbi^t/dt = a (typically a = 1) until he detects 
that one of the bottlenecks that his transmission passes through has reached 
capacity. At this point, the sender cuts his own rate bi^t by a multiplicative 
factor of (3 (typically (3 = \). 

— fk,t, the adjustment frequency, will denote the instantaneous frequency at 
time t at which the event occurs in which the bottleneck reaches capacity 
and instructs its users to back off. 

The equations relating these values are as follows. 



\/k 



I fk,t > 0 and ^ bi^t 
V iejt(k) 




Sbj^t 

6t 



= a-{l-P)bi^t X! 

k^B(i) 



or 




0 and 



b^^t<sBk\{l) 

ieJt(k) J 



Vt 



(2) 
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Equation 1 states that the total bandwidth through the 

bottleneck is bounded by its capacity sBfc. More over, this bottleneck instructs 
its users to back off if and only if it is at capacity. Equation 2 states that each 
job Ji continually increases his bandwidth linearly at a rate of Sbi^t/St = a 
and approximates the effect of the multiplicative deceases. When any one of the 
bottlenecks that Ji passes through reaches capacity, its bandwidth bt^t decreases 
by a multiplicative factor of (3, i.e. from bi^t to which is a decrease of 

(1 — f3)bi t- The number of times that this occurs during a time period of length 

is Y.k^B(i) fk,t St for a total decrease of (1 - P)bi^t J2keis{i) fk,t St. Clearly, 
Equation 2 is only a differential approximation of the decreases that occur at 
discrete points in time. This same approximation was made in [12,10]. 

The main difference between this model and Kelly’s in [12,10] is that Kelly 
has a single equation (fk,t) = fJ'k = Pk{^i^j^(^k) Si,t) defining a bottleneck’s 
adjustment frequency fk,t as a function of the total flow X)ieJt(fc) through 
the bottleneck. Though Kelly defines fik instead to be “the proportion of marked 
packets”, it is used in the same way in Equation 2 as we do and we assume that 
this quantity reflects the proportion of the jobs passing through the bottleneck 
that will adjust and hence is related to our frequency fk,t- Moreover, Kelly does 
not speak of the bottlenecks having a capacity, but presumably this fixed non- 
negative, continuous, strictly increasing function pk can be such that as this total 
flow increases towards the bottleneck’s “capacity” , a sufficiently strong message 
is given to the jobs to back off that this capacity is never exceeded. 

In contrast, our model does not have a single equation defining a bottleneck’s 
adjustment frequency fk,t- We feel that this is a better model for AIMD when it 
is driven by bottleneck capacities, because when an individual bottleneck adjusts 
in practice does depend in an intricate way on when the other bottlenecks adjust. 
For example, having a job pass through a long line of m bottlenecks with the 
same capacities, should be equivalent to passing through only one. In Kelly’s 
model, each of these bottlenecks will send the same message as if it were the 
only bottleneck and hence the job will back off m times more often. On the 
other hand, in our model, it is irrelevant and undefined which one of bottlenecks 
will adjust. We can only make claims about fk,t- 

Not knowing which bottlenecks are at capacity adds extra complications. 
One way to ensures that each bottleneck is at capacity is to assume that each 
bottleneck k, has a local job i{k) that goes only through the k*^ bottleneck. 
This job will be free to increase its bandwidth filling any remaining space in the 
bottleneck. This change allows us to ignore the second half of Equation 1. 

Given the current bandwidth allocations the next values are determined 
by first solving a system of equation for the adjusting frequencies fu,t and then 
using these to compute Sbi^t/St. The following matrix notation is useful. Let 
M denote the 0/1 matrix such that M^ i = 1 iff the job is in the k*^ bot- 
tleneck. Similarly, define the vectors B = (Bk), f = (fk,t), b' = (Sbi^t/St), 
Oif = (0, . . . , 0), and 1„ = (1, . . . , 1). In contrast, represent the bandwidths bi^t 
as an n X n matrix with diagonals bt^t and the rest zero. Equations 1 and 2 
translate into M61„ = sB and b' = o;l„ — (1 — P)bM'^ f. Note there is one equa- 
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tion and one unknown 6' and fk,t for each job and for each bottleneck. We can 
solve these as follows. Differentiating the first gives Mb' = Ok- Substituting the 
second into this gives M(al„ — (1 — /3)6M^/) = Ok or o;Ml„ = — j3)MbM'^ f . 

Solving this gives the required values / = 1„. These values 

are used in b' = o;l„ — (1 — /3)&M^/ = al„ — to compute 

b' . These in turn gives us the next values for bi^t, namely bi^t+st = bi^t + 

(This can’t easily be represented as a matrix because b is square and b' is a 
vector.)^ 

The steady state of this system occurs when Sbi^t/^t = 0. Equation 2, then 
gives bi^t = (i-! 3 ) / (^UkeS(i) fk,t^ ■ ^ ^ our strong belief, that this system quickly 
converges to this state. If the dynamic system allocates job Ji an amount that 
is different from this then Equation 2 automatically moves it closer. Assume, for 
example that job Ji just arrived and hence, bi^tg = 0- If assume that the total 
frequency fi^t = remains relatively constant for a few adjustment 

periods, then the single differential equation Sbi^t/St = a — (1 — (3)bi^tfi,t can be 
solved in isolation from the others, giving bi^i^tg+d) = 

The time until the AIMD allocation to the job is within a factor 1 — « 

1 — /3“® of the steady state allocation is di = In the single bottleneck case, 
this equals q adjustment periods, which corresponds exactly to the results given 
in [6]. 

To avoid the problem of whether the system quickly converges, we will simply 
define another model, denoted AIMDEQUI, which is the previous model at its 
steady state. Replacing Equation 2 with Equation 4 gives the equations defining 
AIMDEQUI to be: 



Vfc I fk,t > 0 and ^ 6i,t = sBk 



or 



fk,t = 0 and ^ bi^t < (3) 



ieJt(fc) 



J 






a 



( 1 -/?) 



/ y~! fk,t 



(4) 



,fceBp) 



In the matrix notation, these translate into M&1„ = B and bM'^ f = 



3 Socialistic, Local, and Free Market Views of Fairness 

It is clear what a fair distribution is of a single resource like the bandwidth 
of a bottleneck. However, when different jobs are restricted by different bottle- 

^ If M were square and invertible then b' = aln — — 

abM^[{M^)~^b ~^ = aln — aR = 0. But we already know this from 
Mb' — Ok- However, I confess, I do not fully understand what happens when M 
is not square or invertible? Also we need to be able to invert MbM"^ . For what it 
is worth, MbM"^ is positive semi-definite. In fact, V«, z^[MbM'^\z > 0. Does this 
mean that MbM'^ is invertible? I have no proof that fk,t does not go negative, which 
would go against the intuition. 
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necks with different capacities, it is not clear what is “fair” . This section defines 
three views of fairness: socialistic, local, and a Free Market, with corresponding 
“Equal Partition” schedulers: SEQUIg, LEQUI^,, and F-EQUI 5 . AIMDEQUI will 
be evaluated with respect to each. 

The socialistic view attempts to give each job the same bandwidth. A job is 
only given more bandwidth than another if they are not in competition for this 
extra bandwidth. SEQUI^, achieves such a distribution of bandwidth as follows. 
Starting with zero bandwidth to each job, increase the bandwidth of each job 
equally, except fixing that to any job passing through a bottleneck that is at 
capacity. According to this view, [7] proves that AIMD can be unfair by a factor 
of m to jobs that pass through m bottlenecks. An open problem is to prove that 
this is the worst case. 

In the local view, a bottleneck never gives a job more bandwidth than is fair 
from its local information. In the scheduler EEQUI^,, the bottleneck tries 
to allocate a fair share of its bandwidth to each of the rik,t = \ Jt{k)\ jobs 
that pass through it. A job, however, may not be able to receive this high of a 
bandwidth because of the constraints of its other bottlenecks. Therefore, EEQUI^ 
allocates to job Jj the minimum allocated by each of the bottlenecks though 
which it passes, i.e. = minfcgg(q This locality of the fairness is used to 
reduce a schedule on the general network G to one separate single bottleneck 
network for each of G’s bottlenecks. Using this. Theorem 3 proves that though 
LEQUI sometimes allocates less bandwidth than it could, it is 0(m^)-speed 
0(m)-competitive. The same result automatically applies for SEQUI because it 
never allocates less bandwidth to any job. Lemma 3 proves that AIMDEQUI 
allocates at least as much. Theorem 2, stating that AIMDEQUI is 0{m^)- 
speed 0 (m)-competitive, follows. 

The free market view of fair argues that it is not fair to allocate the same 
bandwidth to every job when the jobs pass through different numbers of bot- 
tlenecks with different demands on their bandwidth. Instead, in this view each 
job is charged by each bottleneck it passes through for the bandwidth that it 
uses at a cost which decreases proportional to the supply, namely its capac- 
ity sBk, and increases proportional to the demand, namely the number of jobs 
nk,t = \Jt{k)\ passing through it or perhaps on the number that are 

most constrained by it. Then each job is allocated the same cost of bandwidth. 
AIMDEQUI itself is a scheduler that once the costs are rigged slightly is com- 
pletely fair in this sense. The adjustment frequency fk,t of a bottleneck is a 
reasonable cost for its bandwidth because Lemma 2 proves that it is bounded 

^max 

within [~ Lemma 1 proves equality of this relationship on 

average, i.e. nt = fk,tBk- Being charged for its bandwidth by each 

bottleneck it passes through. Job Ji is charged a total of (X)feeB(i) fk,t)bi,t- Equa- 
tion 4 then enforces that the allocations of bandwidth are such that this charge 
is the same for all jobs. The global aspect of this view of fairness is used to 
reduce AIMDEQUI on the entire network G to a single network with a single 
bottleneck. This is used to prove Theorem 4, which states that AIMDEQUI 
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is 0(l)-speed 0(l)-competitive when these adjustment frequencies fk,t do not 
change much within the life of an individual job. 



4 The Competitiveness of AIMDEQUI 

To understand the worst-case analysis results in the literature, we need to intro- 
duce and motivate resource augmentation analysis [8]. A scheduling algorithm 
A is said to be s-speed c-competitive if maxj qpt/(j) ^ c where Ag{J) denotes 
the average flow time for the schedule given by A with a speed s on input J, 
and similarly OPTi(J) denotes the flow time of the adversarial schedule for J 
with a unit speed. 

Though most scheduling papers consider the allocation of a fixed number of 
processors between the active jobs, the results hold for our setting of allocating 
the fixed bandwidth of a single bottleneck network. It is shown in [4] that the 
algorithm, EQUI, which devotes an equal amount of processing power to each 
job, is a (2-|-e)-speed 0(l-|-l/e)-competitive algorithm for scheduling of jobs with 
“natural” speed-up curves. The result in the original paper [4] stated < 

jji^y This was improved in [5] for the purpose of proving Theorem 2tol-|- 
which does not change the result 0{j) when the speed s is 2 -|- e, but 
when the speed s is large, the improvement is from 2 + 0{^) to 1 + 0{^). It 
is likely that the competitive ratio should be 1 -I- O(^), but as of yet that is 
unattainable. 

To be more complete, all the results allows arbitrary “natural” speed-up 
curves. For example, fully parallelizable work is the usual in which the rate 
at which the work gets completed is in proportion ^ to the amount of band- 
width/processors allocated. In contrast, sequential work gets completed at 
a fixed rate independent of the amount of resources allocated. Given that any 
non-clairvoyant algorithm (limited knowledge about jobs) is bound to waist lots 
of resources on sequential jobs, it is surprising that the algorithm is competitive 
against an all knowing adversary when given only a little extra resources. In fact, 
more general speedup curves are also allowed. One might not think that such 
a result would have any direct application to the problem of transmitting files. 
However, it does. Each sender may have a different upper bound on the rate at 
which it can transmit data. This can be modeled by representing the transmis- 
sion with a job whose speedup function is fully parallelizable up to the senders 
capacity and then becomes sequential for any additional bandwidth allocated to 
it, namely . It is not needed, but to make the proofs simpler. Lemma 1 in [4] 
proves that the worst cast set of jobs is such that every phase of every job is 
either fully parallelizable or is sequential. Hence, we will restrict our attention to 
these. Another improvement needed to prove Theorem 2 that [5] provides over 
[4] is that it allows the optimal scheduler to complete the fully parallelizable 
work and the sequential work independently. The formal statement needed is as 
follows. 
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Theorem 1 ([5]) Let J be any set of jobs in a single bottleneck network in 
which each phase of each job can have an arbitrary sublinear-nondecreasing 
speedup function. opTi (^2-HOPTi (j^eq) — ^ where Jpar and Jseq 

contain respectively only the non-sequential and the sequential phases of the jobs 

J. 

The main result of this paper is that AIMDEQUI despite being online, non- 
clairvoyant, and distributed is 0(m^)-speed 0(m)-competitive. 

Theorem 2 Let G be any general network. Let J be any set of jobs in which 
each phase of each job can have an arbitrary sublinear-nondecreasing speedup 
function. Let m denote the maximum number of bottlenecks that a job passes 
through. Lt follows that 

Proof of Theorem 2: The result follows from Theorem 3 stating that LEQUI 
is 0(m^)-speed 0(m)-competitive and from Lemma 3 stating that AIMDEQUI 
allocates at least — as much bandwidth as LEQUI to each job. I 

Theorem 3 — ^{m)- 

Proof of Theorem 3: This proof uses the fact that LEQUI is locally fair 
at each bottleneck. It is a reduction to many instances of Theorem I. For each 
bottleneck within the general network G, the proof reduces what occurs in that 
bottleneck to a separate single bottleneck network with capacity Bk on a job set 
denoted The proof is two pages and is quite involved. It is omitted because 
of the page restriction. ■ 

The previous result can be completely tightened giving that AIMDEQUI is 
(2 -I- e)-speed G(I)-competitive if we assume that the adjustment frequencies 
of the bottlenecks do not to change much within the life of an individual job. 
This we believe is a reasonable assumption because the adjustment frequencies 
are a global property that should not be greatly effected by the arrival and the 
completion of individual jobs. 

Theorem 4 Let G be any general network. Let J be any set of jobs in which 
each phase of each job can have an arbitrary sublinear-nondecreasing speedup 
function. Suppose for each job Jj, the ratio (X)fceB(i ) between 
adjustment frequencies does not chanqe by more than a factor ofr throuqh out the 
Ufe of a job, where r > 1 ^s some constant. Lt follows that < 

0(1 + i). 

Proof of Theorem 4: This proof uses the fact that AIMDEQUI is Free Mar- 
ket Fair. It is a simple reduction to Theorem 1 by reducing everything that is 
occurring within the general network G to a single network with a single bot- 
tleneck with capacity B = 1, namely AiMDEQUEg+^)(G,j) ^ ^ = 

0 (l + i). The last step is a direct application of Theorem 1. 

Define Fi^t = fk,t)/ (J2k fk.tBk) to be a needed comparison between 

the adjusting frequency of job Ji at time t and that of the overall network. By 
the statement of the theorem, this does not change by more than r through out 
the life of the job and hence Fi < Ft^t < i"Fi for some Fi. We construct another 
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set of jobs by scaling the fully parallel work in job Ji G by this constant 

F,. 

The first step is to prove that AIMDEQUIr( 2 +e)(f^) ^ EQUI( 2 +e)(l, J^^)- 
By induction on t, assume that at time t AIMDEQUI^( 2 +e)(f^) has com- 
pleted at least as much work on each job as EQUI( 2 +e)(l, We prove as 
follows that the first algorithm allocates at least Jr times more bandwidth 
to job Ji at this time than the second does, i.e. bf^. By the 

bound on Fi^t given by the theorem and that on bf^ given in Equation 4, 

Fi'bf^> \iFi^t-b'^f = \ fk,t) / {Yhk fk,tBk) ■ /(X)fceB(i) “ 

(2 -h e)/ {r{‘^ + fk,tBk)^ ■ By Lemma 1, this is (2 -h e)/n^. By the 

induction hypothesis nf < nf and hence this is at least (2 -|- e)/nf, which 
is the bandwidth bf^, allocated by EQUI( 2 +e)(l, Because bf^> Jr • bf^, 
AIMDEQUI^( 2 +e)(f^) continues to keep up. The second algorithm has Fi 
times as much fully parallelizable work and by definition sequential work com- 
pletes at a fixed rate independent of the number of processors allocated. This 
completes the proof by induction. 

The final step in the proof is to compare the optimal algorithms 
OPTi(G,j7) > OPTi(l,j7^). This is done by constructing another algorithm 
0PT{ for which OPTi(G, J^) = 0PT{(1, Because OPTi(l,j7/) is the op- 
timal algorithm, 0PT{(1, > OPTi(l, J'-f ). 0PT{(1, is defined to be 

the same as OPTi(G, J) except that the bandwidth allocated to job Ji is scaled 
by Fi, i.e. b{^ = Fi-bf^. OPTi(G, J7) = 0PT{(1, because both the amount 
of parallel work and the number of processors have been scaled by Fi. What 
remains is to show that 0PT{(1, jf) does not allocate more than a total of one 
bandwidth at any given time, namely J2i H,t = Y^iFi • bf^ < • bf^ = 

YiiYkeB{i) fk,t)/ (Yk fk,tBk) • bfj. = (Yk fk,t{Yiejt(k) b?,t))/(Ykfk.tBk). Be- 
cause OPTi(G, J7) cant exceed the capacity of the bottleneck, this is at 
most (Yk fk,t{Bk))/iYk fk,tBk) = 1. This completes all the required steps of 
the proof. I 



Lemma 1 The number nt of jobs active at time t under AIMDEQUIg is nt = 

^-^Ykh,tBk. 



Proof of Lemma 1: nt = X^active i "''^hich by both the left and right sides of 
Equation 4 equals X* XfceB(*) fk,tbi/~^ = Xfc fkAY^(^Mk) ht), which 



by Equation 3 equals fk,tsBk- ■ 



Lemma 2 The adjustment frequency for the bottleneck is bounded by fk,t & 

„max 1 

Q 1 ^k,t 

(1 — / 3 ) m sBk ’ sBk 

The proof is a simple quarter page algebric proof. It is omitted because of 



the page restriction. 



Lemma 3 The bandwidth allocated by AIMDEQUIg(j7) to job Ji at time t is 
at least F that allocated by LEQUIg(j7), i.e. bi^t > minfcgB(i) It follows 



. AIMDEQUI^AJ) 



< 1 . 
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Proof of Lemma 3: Consider a job Ji. Equation 4 gives (mmaxfcgg(q fk,t)h,t 
> (EfceB(d and hence 7 ^- Applying the 

bound in Lemma 2 gives > A minj.gg(-q ® 



References 

1 . F. Baccelli and D. Hong. AIMD, Fairness and Fractal Scaling of TCP Traffic 
RR-4155 INRIA Rocquencourt and Infocom, June 2002 

2. A. Borodin and R. El-Yaniv. Online Computation and Competitive Analysis. Cam- 
bridge University Press, 1998. 

3. D.M. Chiu and R. Jain. Analysis of the increase and decrease algorithms for con- 
gestion avoidance in computer networks. Computer networks and ISDN systems, 
17(1):1-14, 1989. 

4. Jeff Edmonds. Scheduling in the dark. In Journal of Theoretic Computer Science, 
1999 and ACM Symposium on Theory of Computing, pages 179-188, 1999. 

5. Jeff Edmonds. Scheduling in the dark - improved results: manuscript, 
http://www.cs.yorku.ca/~jeff, 2001. 

6. J. Edmonds, S. Datta, and P. Dymond, TCP is Competitive Agains a Limited 
Adversary, Proc. 15*^ Ann. ACM Symp. of Parallelism in Algorithms and Achi- 
tectures, pp. 174-183, 2003. 

7. S. Floyd. Connections with multiple congested gateways in packet-switched net- 
works, part I: One-way traffic. Computer communications review, 21(5):30-47, 
October 1991. 

8. Bala Kalyanasundaram and Kirk Pruhs. Speed is as powerful as Clairvoyance. 
Journal of the ACM, 47(4):617-643, 2000. 

9. R. Karp, E. Koutsoupias, C. Papadimitriou, and S. Shenker. Optimization prob- 
lems in congestion control. In IEEE Symposium on Foundations of Computer 
Science, pages 66-74, 2000. 

10. F. Kelly. Mathematical modelling of the internet. In Bjorn Engquist and Wilfried 
Schmid (Eds.), Mathematics Unlimited - 2001 and Beyond. Springer, 2001. 

11. F. Kelly. Fairness and stability of end-to-end congestion control European Control 
Conference, Cambridge, 2003 

12. F. Kelly, A. Maulloo, and D. Tan. Rate control in communication networks: shadow 
prices, proportional fairness and stability. In Journal of the Operational Research 
Society, volume 49, 1998. 

13. J. Kurose, and K. Ross, “Computer networking: A top-down approach featuring 
the Internet”, Addison- Wesley, 2002. 

14. R. Motwani, S. Phillips, and E. Torng. Non-clairvoyant scheduling. Theoretical 
computer science (Special issue on dynamic and on-line algorithms), 130:17-47, 
1994. 




Gathering Non-oblivious Mobile Robots 



Mark Cieliebak 

Institute of Theoretical Computer Science 
ETH Zurich 

cieliebakSinf . ethz . ch 



Abstract. We study the Gathering Problem, where we want to 
gather a set of n autonomous mobile robots at a point in the plane. This 
point is not fixed in advance. The robots are very weak, in the sense that 
they have no common coordinate system, no identities, no central co- 
ordination, no means of direct communication, and no synchronization. 
Each robot can only sense the positions of the other robots, perform a 
deterministic algorithm, and then move towards a destination point. It is 
known that these simple robots cannot gather if they have no additional 
capabilities. In this paper, we show that the Gathering Problem can 
be solved if the robots are non-oblivious, i.e., if they are equipped with 
memory. 



1 Introduction 

We consider a distributed system whose entities are autonomous mobile robots, 
where the robots can freely move in the two-dimensional plane. The coordination 
mechanism for these robots is totally decentralized, i.e., the robots are completely 
autonomous and no central control is used. The research interest is to establish a 
minimal set of capabilities the robots need to have to be able to perform a certain 
task, like forming a pattern. In this paper, we study the problem of gathering the 
robots at a point. This problem is known as Gathering Problem (or rendez- 
vous, or point-formation problem) and is obviously one of the most primitive 
tasks that a set of robots might perform. The Gathering Problem has been 
studied intensively in the literature, in particular in the realm of distributed 
computing [2, 4, 5, 7,8], but also in robotics [3] and artificial intelligence [6]. 

We study the Gathering Problem for a set of weak robots: the robots 
are anonymous (i.e., identical), they have no common coordinate system, and 
they have no means of direct communication. All robots operate individually, 
according to the following cycle: Initially, they are in a waiting state. They wake 
up independently and asynchronously, observe the other robots’ positions, and 
compute a point in the plane. They start moving towards this points, but may 
not reach it (e.g. because of limits to the robot’s motion energy). Then they 
become waiting again. Details of the model are given in Section 2. For these 
robots, the Gathering Problem is defined as follows: 

Definition 1. Given n robots r\, . . . , r„, arbitrarily placed in the plane, with no 
two robots at the same position, make them gather at one point. 
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If the robots are asked only to move “very close” to each other, this task 
is easily solved: each robot computes the center of gravity^ of all robots, and 
moves towards it. However, in the Gathering Problem we ask the robots to 
meet at exactly one point. 

If the robots are oblivious, i.e., if they do not remember previous observations 
and calculations, then the Gathering Problem is unsolvable [7,8]. On the other 
hand, the problem can be solved if we change the nature of the robots: If we 
assume a common coordinate system, gathering is possible even with limited 
visibility [5]; if the robots are synchronous and movements are instantaneous, 
then the Gathering Problem has a simple solution [8] and can be achieved 
even with limited visibility [2]; finally, the problem can be solved for more than 
two robots if the robots can detect how many robots are at a certain point 
{multiplicity detection) [4]. Recently, the Gathering Problem was studied in 
the presence of faulty robots; assuming a strong model of synchronizity, then the 
non-faulty robots can gather if at most one third of the robots are faulty [1] . 

In this paper, we show that the Gathering Problem is solvable for n > 2 
non-oblivious robots. First, we present in Section 4 an algorithm that gathers 
n = 2 robots. At the beginning, two robots move on a line which connects their 
initial positions, away from each other. As soon as both robots have observed 
the configuration at least once (hence, they know t), they start moving on lines 
perpendicular to I until, again, both have seen both perpendicular lines. Finally, 
they meet on I in the center between the two perpendicular lines. 

For more than two robots, we distinguish in Section 5 how many robots are 
on the smallest enclosing circle SEC of the positions of all robots in the initial 
configuration. If there are more than two robots on SEC , then each robot moves 
on a circle around the center of SEC until all robots have seen SEC . Hereby, 
we use the fact that the smallest enclosing circle of the robots positions does 
not change. Then all robots gather at the center of SEC . On the other hand, if 
there are only two robots on SEC , then the robots that are not on SEC move 
perpendicular to the line (. connecting the two robots on SEC , while the robots 
on SEC move on line £ away from each other. The smallest enclosing circle 
increases, but £ remains invariant. As soon as all robots have seen line £ and the 
configuration, they gather at the intersection between £ and a line k, which is 
the median perpendicular line of the robots, if n is odd, or the center between 
the two median perpendicular lines, if n is even. 



2 Autonomous Mobile Robots 

A robot is a mobile computational unit provided with sensors, and it is viewed 
as a point in the plane. Once activated, the sensors return the set of all points in 
the plane occupied by at least one robot. This forms the current local view of the 
robot. The local view of each robot also includes a unit of length, an origin (which 
we assume w.l.o.g. to be the position of the robot in its current observation), 

^ For n points pi, ■ ■ ■ ,Pn in the plane, the center of gravity is c = Pi. 
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and a coordinate system (e.g. Cartesian). There is no a priori agreement among 
the robots on the unit of length, the origin, or the coordinate systems. 

A robot is initially in a waiting state ( Wait) . Asynchronously and indepen- 
dently from the other robots, it observes the environment (Look) by activating 
its sensors. The sensors return a snapshot of the world, i.e., the set of all points 
that are occupied by at least one other robot, with respect to the local coordinate 
system. The robot then calculates its destination point {Compute) according to 
its deterministic algorithm (the same for all robots), based only on its local view 
of the world. It then moves towards the destination point {Move); if the destina- 
tion point is the current location, the robot stays still. A move may stop before 
the robot reaches its destination. The robot then returns to the waiting state. 
The sequence Wait - Look - Compute - Move forms a cycle of a robot. 

The robots are fully asynchronous, i.e., the amount of time spent in each 
state of a cycle is finite but otherwise unpredictable. In particular, the robots 
do not have a common notion of time. As a result, robots can be seen by other 
robots while moving, and thus computations can be made based on obsolete 
observations. The robots are anonymous, meaning that they are a priori indis- 
tinguishable by their appearance, and they do not have any kind of identifiers 
that can be used during the computation. Finally, the robots have no means of 
direct communication: any communication occurs in a totally implicit manner, 
by observing the other robots’ positions. 

There are two limiting assumptions concerning infinity: The amount of time 
required by a robot to complete a cycle is not infinite, nor infinitesimally small; 
and the distance traveled by a robot in a cycle is not infinite, nor infinitesimally 
small (unless it brings the robot to the destination point). As no other assump- 
tions on space exist, the distance traveled by a robot in a cycle is unpredictable. 
All times and distances are under control of the adversary. We assume in our 
algorithms that the adversary is fair, in the sense that he respects the previous 
assumptions, and that no robot sleeps forever, since otherwise no algorithm can 
guarantee to gather the robots. 

For the remainder of this paper, we assume that the robots are non-oblivious, 
meaning that each robot is equipped with infinite memory, and its computation 
in each cycle can be based on its observations and computation results from 
previous cycles. 



3 Notation 



In general, r indicates any robot in the system; when no ambiguity arises, r is 
used also to represent the point in the plane occupied by that robot. A configu- 
ration of the robots at a given time instant t is the set of positions in the plane 
occupied by the robots at time t. 

We say that a point p is on a circle if it is on the circumference of the circle, 
and that p is inside the circle if it is strictly inside the circle. Given three distinct 
points p, q and c, we denote by <{p, c, q) the convex angle (i.e., the angle that is 
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Fig. 1. Smallest enclosing circle SEC 
for 8 points. 




Fig. 2. Proof of Lemma 2. Center of 
SEC cannot be at q. 



at most 180°) between p and q, centered in c. The Euclidean distant between p 
and q is denoted by dist{p,q). 

Given a set of n distinct points P in the plane, the smallest enclosing circle 
of the points is the circle with minimum radius such that all points from P are 
inside or on the circle (see Figure 1). We denote it by SEC{P), or SEC if set 
P is unambiguous from the context. The smallest enclosing circle of a set of n 
points is unique and can be computed in polynomial time [9]. 

The smallest enclosing circle of P remains invariant if we move some of the 
points from P that are inside SEC such that they remain inside SEC] moreover, 
the maximum angle between any two adjacent points on SEC w.r.t. the center of 
SEC is 180°, since otherwise there would be a smaller circle enclosing all points. 
The following lemma shows that the smallest enclosing circle remains invariant 
even if we move the points along the rim of SEC , as long as no angle of more 
than 180° between adjacent points occurs. 

Lemma 1. Let P = {pi , ... ,pfc} be k points on a circle C with center c. If the 
maximum angle between any two adjacent points w.r.t c is at most 180°, then C 
is the smallest enclosing circle of the points. 

Proof (sketch). The idea of the proof is as follows (cf. Figure 2): Assume that 
the center of SEC{P) would be at some point q ^ c. Then there are two adjacent 
points x,y G P such that their angle w.r.t. c is minimum (and at most 180°), and 
such that q is within the sector of C that is beyond c and delimited by the lines 
G and £y from x and y, respectively, through c (bottom sector in Figure 2). Let 
£ be the perpendicular line that bisects the angle between x and y (dashed line £ 
in Figure 2). If x and q are not on the same side of £, then dist{x, c) < dist{x, q); 
otherwise, y and q are not on the same side of £, and dist{y,c) < dist{y,q). In 
both cases, the radius of C is at most the radius of SEC{P). Thus, since the 
smallest enclosing circle is unique, we have C = SEC{P). □ 



4 Gathering Two Robots 

In this section, we present an algorithm that solves the Gathering Problem 
for two robots. The idea of our algorithm, which is similar to the algorithm 
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Algorithm 1 Gathering two robots 
If first observation Then 

xo := my position; j/o other robot’s position; 

£ t— line through x and y\ do distance between x and y, 
state <— 1; move on £ by distance ^ away from y- 
5: If state = 1 Then 

If other robot is at yo Then do nothing; 

Else 

Xperp t— my position; yi := other robot’s position; 
di := distance between Xperp and t/i; 

10: If other robot is on £ Then state •(— 2; move perpendicular to £ by 

Else 

yperp t— intersection between £ and line through other robots position 
perpendicular to £\ dperp t— distance between Xpe.rp and yperp’, 
state <— 3; move perpendicular to £ by distance 
If state = 2 Then 

15: If other robot is on £ Then do nothing 

Else 

yperp t— intersection between £ and line through other robots position per- 
pendicular to £; dperp distance between Xperp and yperp', 
state <— 3; do nothing; 

If state = 3 Then 

20: If other robot is on the line perpendicular to £ through yperp and less than dperp 

away from £ Then move perpendicular to £ to distance dperp', 

Else 

g t— center point between Xperp and yperp', 
state 4; do nothing; 

If state = 4 Then 

25: If I am not at g Then move to g Else state STOP', do nothing; 

End. 



presented in [8], is as follows: The two robots move away from each other until 
both have seen the configuration at least once. Then they know the connecting 
line £ through their initial positions. In a next phase, they both move on lines 
that are perpendicuar to £, again until both have seen the other robot at least 
once on its perpendicular line. Then they both know £ and its intersection with 
the two perpendicular lines, hence, they can gather on £ in the center between 
the perpendicular lines. 

Lemma 2. Two robots can gather at a point. 

Proof. Both robots perform Algorithm 1. Here, we use ^ to assign a value to a 
variable that is stored in the permanent memory of the robot (and is available 
in subsequent cycles), while we use := to assign values to variables that are only 
used in the current cycle. 

We now prove that this algorithms gathers the two robots. Let r and s be the 
two robots. The following proofs are presented from the point of view of one robot 
r; analogous proofs yield the same propositions for the other robot. We denote 
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Fig. 3. Illustration of the algorithm for two robots. Distances are not drawn to scale. 



the variables of robot r and s with superscript r and s, respectively. Let imit 
be the line through the initial positions of the robots, before any of the robots 
made its first movement. A schematic illustration of the robots’ movements can 
be found in Figure 3. 

1. If robot r is the first robot that leaves imit, then both robots agree on £, i.e., 

r = £^ = £,„,t. 

Proof. If r leaves imit while s is still on the line, then r is in state 1 and 
£'' = £init- Moreover, r has seen s in two different positions on £mit, thus, s 
has moved on £init before r leaves £mit- Hence, s has seen £init already, and 
we have £“ = £inn ■ 

2. Both robots eventually leave £init- 

Proof. Assume that robot r wakes up first. Then it moves by ^ and enters 
state 1. As soon as s has moved at least once, r moves away from £init by 
either if s is still on £init, or by if s has left £mit- Hence, as soon as 

r has observed the first movement of s, it leaves £mit- If robot s has left £init 
at that time already, we are done. Otherwise, we know from Item I that s 
is in state 1, since it knows already £init, but it is still on £inu- Hence, when 
s wakes up the next time, it observes that r has left £init, and s moves away 
from £init by . 

3. Every subsequent movement of robot r after it left line £mit is perpendicular 
away from £init, until it reaches state 4. 

Proof. When r moves away from £init for the first time, it is in state 1. By 
construction, this movement is perpendicular away from £init, starting in 
Xperp, by either distance ^ or . Afterwards, robot r moves only if it 
is in state 3, and there the movements are by definition perpendicular to 
£init- It remains to show that r always moves away from £mit- Since 
never changes, it is sufficient to show that ^ To see this, let dinit 

be the distance between the initial positions of the robots. When the robots 
wake up first, each of them makes one movement by at most and 
respectively, on £init, away from the other robot. Afterwards, all movements 
are perpendicular to imit- Hence, we have dinit ^ d\ < dmit + 
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With dp < dinit + ^ and dg < dinit + ^ , straight-forward analysis shows 
that d\ < ^dinit- This yields the claim, since d^g^p is obviously greater than 

^init • 

4. Both robots eventually agree on point g, and gather there. 

Proof. Due to Item 2, both robots eventually leave line £init, say at positions 
rperp and Sperp- Let r be the robot that leaves £init first. Then r stores value 
Tperp in Xpggp, moves by ^ away from imit, and enters state 2 , where 
it remains until s will have left iinit- When s wakes up the next time, it 
observes that r has left iinit, and moves perpendicular away from £i„it, too. 
Moreover, it stores Sperp in ajpg^p, and Vp^rp in yp^rp, since robot r has moved 
only perpendicular to ii„it due to Item 3. The next time robot r wakes up, 
it observes that s has left £init, too, and stores yp^rp = Sperp (again, since s 
moved only perpendicular to £init)- Hence, both robots agree on the points 
Tperp and Sperp where they left £initi on distance dperp between these points, 
and on the center point g. Moreover, both robots move on their perpendicular 
line until at least one of them, say s, has reached distance dperp from iinit 
(state 3). When this is observe by the other robot r, it enters state 4 and 
moves straight towards g, hence, r leaves its perpendicular line. When s 
wakes up the next time, it observes that r has left its perpendicular line, 
and s starts moving towards g, too. Eventually, both robots reach g and 
gather there. 



□ 



5 Gathering n > 2 Robots 

We now show how to gather more than two robots. We split the algorithm up into 
two separate cases, depending on the number of robots on the smallest enclosing 
circle SEC in the initial configuration: if there are at least three robots on SEC, 
we make all robots move on circles around the center of SEC until all robots 
know SEC (which does not change during the movements); then we gather the 
robots at the center of SEC. This is shown in the following Lemma 3. On the 
other hand, if there are exactly two robots on SEC , then we adapt the algorithm 
for two robots from Section 4 to gather all robots at the line connecting the two 
robots on SEC . This is shown in Lemma 4. 

Lemma 3. If there are more than 2 robots on the smallest enclosing circle in 
the initial configuration, then the robots can gather at a point. 

Proof. Given a configuration of the robots, we define a movement angle 7 and a 
movement direction moveDir for each robot r as follows (cf. Figure 4): Let c be 
the center of the smallest enclosing circle of all robots. Let C be the circle with 
center c such that r is on C. If there is no other robot on C, then let 7 = 
and moveDir be an arbitrary direction on C, say clockwise. If there are exactly 
two robots on C, then let s be the other robot. [By assumption, C is not the 
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Fig. 4. Idea of Algorithm 2. Arrows indicate movement directions of the robots. 



smallest enclosing circle of the robots.] Let a and /? be the two angles between v 
and s w.r.t. c. Assume w.l.o.g. a < (3. Let 7 = 3^7, and let moveDir be in the 
direction of angle /3. If there are more than two robots on C, then let s and t be 
the two robots on C that are adjacent to r. Let a be the angle between r and s 
w.r.t. c, and [3 be the angle between r and t w.r.t c. Assume w.l.o.g. that a < (3. 
Then a < 180°. If a < 178°, then let 7 = 3^ and moveDir = t. If 178° < a, 
then 7 = ^3gQ~^ and moveDir = t. 

If robot r observes the configation of all robots, it can order the other robots 
in a unique way, for instance by using the coordinates of the robots positions in 
the local coordinate system of robot r. We assume w.l.o.g. that robot r has index 
1 in this ordering. Recall that different robots may have different coordinate 
systems, hence, the robots do not agree on this ordering. We will ensure in our 
algorithm that the basic configuration remains invariant; in particular, robots 
will stay on the same circle with center c, and no two robots on the same circle 
will interchange their position. Each robot stores the positions of all robots that 
it observes in its first cycle in an array posns, where posnsj denotes the position 
of robot rj. Hence, in later cycles robot r can compare the current position 
of a robot rj with the position of rj observed in its first cycle. This allows r to 
determine whether rj has made at least one movement at some time. In addition, 
robot r maintains a vector hasMoved, such that has Moved j is set to true if r 
has observed at least once that robot rj has moved. 

The algorithm that the robots perform is shown in Algorithm 2, and an 
illustration can be found in Figure 4. We prove that the robots gather at c, the 
center of the smallest enclosing circle of the robots initial positions, by showing 
the following items: 

1. Every robot makes at most n moves by angle 7 in its direction moveDir. 

Proof. A robot only moves in direction moveDir in states 2 and 3. If it is in 
state 2, then it moves once in direction moveDir, sets hasMovedi = true, 
and changes into state 3. In state 3, it moves in direction moveDir if a 
value hasMovedj has changed from false to true (i.e., if another robot has 
moved). This can happen at most n — 1 times, once for each other robot. 

2. The angles between two adjacent robots on the same circle changes at most 
by 1°. 

Proof. We have 7 < by definition, and each robot moves at most n 
times by its angle 7. Hence, the movement of a single robot changes the 
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Algorithm 2 Gathering with more than 2 robots on SEC 
If this is my first observation Then 
n t— number of robots; 

SEC <r- smallest enclosing circle of all robots; c •(— center of SEC\ 

If I am at c Then 

5: d := minimum distance of any other robot to c; 

state <— 2 ; move away from c by distance |; 

Else 

If some robot is at c Then state 1; do nothing; 

Else state •(— 2; do nothing; 

10: If state = 1 Then 

If a robot is at c Then do nothing; 

Else state 2; do nothing; 

If state = 2 Then 

posns t— all robots positions, with posnsi my own position; 

15: Vj : hasMovedj false', hasMovedi <— true; 

7 <— my movement angle; moveDir <— my movement direction; 
state 3; move by angle 7 in direction moveDir; 

If state = 3 Then 

If a robot decreased its distance from c Then state 4; do nothing; 

20: Else 

Vj such that robot Vj changed its position w.r.t posns hasMovedj <— true; 
If Vi : hasMovedj = true Then state <— 4; do nothing; 

If at least one value hasMovedj changed to true in this step Then 
move by angle 7 in direction moveDir; 

25: Else do nothing; 

If state = 4 Then 

If I am not at c Then move to c Else state <— STOP; do nothing; 

End. 



angle between itself and its neighbors by at most ny < Thus, even if 
two adjacent robots move in opposite directions, the angle between them 
changes by at most 1 °. 

3. No two robots on the same circle interchange their position. 



Proof. Let v,w,x and y be adjacent robots (in this ordering) on the same 
circle with center c. Assume by contradiction that w and x interchange 
their positions. We show that this cannot happen even if w and x move 
towards each other. The other cases, where either x and w move in the same 
direction, or they move away from each other, can be shown analogous. If w 
and X move towards each other, then <{w, c, x) > <(v, c, w) and <(w, c, x) > 
<{x,c,y). By construction, we have 7 ^, < if <{v,c,w) < 178°, 

then 7 uj = ; on the other hand, if <{v,c,w) > 178°, then 7 ^, = 

^ 60 n ^ ^ 36 oT^ ' Analogously, 7 ^; < Robot w moves at 

most by angle n ■ towards x, and robot x moves at most by angle n • 
towards w (due to Item 1). Hence, the new angle between w and x is at least 
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<{w, c, x) — — Ti'-fx > <{w, c, x)(l — Ygp) > > 0°, i.e., the two 

robots do not interchange their position. 

4. SEC remains invariant until at least one robot has reached its state 4. 

Proof. Until some robots reach their state 4, all robots move on circles with 
center c. Hence, the smallest enclosing circle can only change if the maximum 
angle between the robots on SEC becomes larger than 180° (cf. Lemma 1). 
By previous Item 2, the angle between adjacent robots changes by at most 
1°; thus, if all adjacent robots on SEC in the initial configuration have angle 
at most 178°, the smallest enclosing circle cannot change. If in the initial 
configuration there is exactly one angle between adjacent robots on SEC 
that is greater than 178°, say between robots x and y, then this is for both 
X and y the maximum adjacent angle. Hence, the moving direction of x is 
towards y by definition, and the moving direction of y is towards x. Thus, 
the angle between x and y decreases, and no angle of more that 180° can 
occur. 

For the case that there are 2 angles of more than 178°, first assume that 
there is no robot on SEC that has an angle of more than 178° to both 
neighbors (see Figure 5). Then there are two disjoint pairs of robots x, y and 
M, V such that the angle between x and y is greater than 178°, and the angle 
between u and v is greater than 178°. By construction, x moves towards y 
and y towards x, decreasing the angle between them. Likewise, u and v move 
towards each other. Hence, no angle greater than 180° occurs. 

Now assume that there is one robot r on SEC such that both angles a 
and P to its two neighbors s and t, respectively, are greater than 178° (see 
Figure 6). Assume that a < p. Both s and t move towards r. By definition, 
the movement angle for robot r is 7 = ^ , and r moves towards t. Hence, 
the angle between r and t decreases. On the other hand, even if s does not 
move at all, and even if r moves by maximum angle ny towards t, then the 
new angle between s and r is at most a + ny < 180°. Hence, SEC does not 
change. 

5. If a robot reaches its state 4, then all robots agree on SEC and c. 

Proof. A robot r reaches its state 4 only if hasMovedPj = true for all 1 < 
j < n. This yields the claim, since hasMovedj is set to true only if robot rj 
has made a move, i.e., if it was awake and had observed the configuration, 
including SEC and c. 

6. At least one robot eventually reaches its state 4. 

Proof. Let r be the first robot to wake up. Then r observes the initial con- 
figuration of the robots. If there is a robot at c in the initial configuration, 
then this robot moves away from c in its first cycle. Afterwards, every robot 
that wakes up moves on its circle by its movement angle y. Assuming a fair 
schedule where no robot sleeps for an infinite time, after some finite time 
every robot has woken up at least once. If some other robot but r reaches 
its state 4, then the claim is true. Otherwise, as soon as robot r wakes up 
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Fig. 5. Proof of Lemma 3, 
Item 4, for angles 
a, 13 > 178°. Angles 

are not drawn to scale. 



Fig. 6. Proof of Lemma 3, 
Item 4, for angles a, /3 > 
178°. Angle between the 
dashed lines is n'y. Angles 
are not drawn to scale. 



Fig. 7. Idea of algorithm 
for two robots on SEC. 
Line m is the median per- 
pendicular line. 



the next time, it observes that all other robots have moved since its first 
observation (i.e., hasMovedJj = true for all 1 < j < n), and r enters state 4. 

7. All robots eventually reach their state 4. 

Proof. Due to Item 6, at least one robot r reaches its state 4. In its next cycle, 
this robot moves towards c, i.e., it decreases its distance from c. Hence, all 
other robots that wake up afterwards observe this decrease of the distance, 
and enter their state 4. Assuming a fair schedule where no robot sleeps 
forever yields the claim. 

8. All robots gather at c and stop there. 

Proof. This is obvious, since all robots agree on c due to Item 5, all robots 
reach their state 4 due to Item 7, and each robot that is in state 4 moves 
towards c. 



□ 

We now show how to solve the Gathering Problem if only two robots are on 
the smallest enclosing circle in the initial configuration. 

Lemma 4. If n > 2, and there are exactly 2 robots on the smallest enclosing 
circle in the initial configuration, then the robots can gather at a point. 

Proof (sketch). Let x and y be the two robots on smallest enclosing circle, and 
let (. be the line through x and y. Our algorithm works as follows (see Figure 7). 
First, all robots move “a little bit” until each robot has moved at least once. 
Here, both x and y move on (. away from each other. Every other robots r moves 
on a line perpendicular to I, without reaching the next robot (if any) on the 
same line. The movement of x and y changes the smallest enclosing circle (in 
fact, it increases the radius of the circle), but x and y remain the only robots 
on this circle. Hence, each of the other robots moves always on the same line 
perpendicular to I. As soon as all robots have made one move, they all know 
I and all perpendicular lines. If the number of robots n is odd, then all robots 
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gather at the intersection of I and the median perpendicular line. Otherwise, 
they gather at the intersection of I and the center line between the two median 
perpendicular lines. 

□ 

We summarize our result in the following theorem, which follows immediately 
from Lemmas 2, 3 and 4. 

Theorem 1. The Gathering Problem can be solved for n>2 non-oblivious 
robots. 

6 Conclusion 

We have presented an algorithm that gathers a set of n non-oblivious mobile 
robots. Thus, it is sufficient to equip the robots with memory to make the Gath- 
ering Problem become solvable. Moreover, our results indicates that memory 
is a more powerful capability than multiplicity detection, since we have shown 
that two robots with memory can gather, while two robots with multiplicity 
detection cannot [8]. 

Our algorithm makes generous use of memory, as it stores, among others, the 
exact positions of all robots. It would be interesting to see whether this could 
be significantly reduced. What is the minimum amount of memory necessary to 
solve the Gathering Problem? 
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Abstract. Circulant graphs are popular network topologies that arise in 
distributed computing. In this paper, we show that, for circulant graphs, 
a simple condition for isomorphism, combined with lattices reduction 
algorithms, can be used to develop efficient distributed algorithms. We 
improve the known upper bounds on the vertex-bisection (respectively 
the edge-bisection) width of circulant graphs. Our method is novel and 
provides a polynomial-time algorithm to partition the set of vertices 
(respectively the set of edges) to obtain these bounds and the respective 
sets. By exploiting the knowledge of the bisection width of this topology, 
we introduce generic distributed algorithms to solve the gossip problem 
in these networks. We present lower and upper bounds of the number of 
rounds in the vertex-disjoint and the edge-disjoint paths communication 
models when the number of nodes is prime. 



1 Introduction 

Circulant graphs are popular network topologies that arise in distributed com- 
puting (e.g., supercomputer architectures [14]) and in quantum walk analysis [3]. 
Unlike other highly regular network topologies, computing the shortest paths in 
circulant graphs can be challenging (e.g., NP-hard [11]). 

The bisection width of a network topology is an important factor for deter- 
mining the complexity of distributed algorithms in which information has to be 
exchanged between two subsets of the networks. In this paper, we give new upper 
bounds on the vertex-bisection (respectively the edge-bisection) width of circu- 
lant graphs. Our upper bound on the vertex-bisection width of circulant graphs 
with n-nodes provides an improvement of a factor O(lnn) compared to the best 
known results when n is prime. Moreover, our method provides a polynomial- 
time algorithm to partition the set of vertices (respectively the set of edges) to 
obtain these bounds and the respective sets. By exploiting this knowledge, we 
introduce generic distributed algorithms to solve the gossip problem. We give 
lower and upper bounds of the number of rounds required by these algorithms 
in the vertex-disjoint and the edge-disjoint paths communication models. 

Circulant graphs are regular graphs based on Cayley graphs defined on the 
Abelian group We recall that an n- vertex circulant graph G is a graph whose 
adjacency matrix A = is a circulant. That is, the fth row of A is the 

cyclic shift of the first row by f — 1, Uij = aij-i+i, with i,j = 1, . . . ,n. In this 
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section, the subscripts are taken modulo n, that is aij = ai+„j = atj+n for all 
integers i and j (the interval [1, n] is more convenient here). We also assume that 
an = 0, i = 1, . . . ,n. Therefore, with every circulant graph one can associate a 
set S C of the positions of non-zero entries of the first row of the adjacency 
matrix of the graph. Respectively we denote by {S)„ the corresponding graph. 
The elements of the generating set S are called chords. 

We recall that two graphs Gi, G 2 are isomorphic, and write G\ ~ G 2 , if 
their adjacency matrices differ by a permutation of their rows and columns. For 
general graphs the isomorphism problem is known to be in NP, not known to 
be in P, and probably is not NP-complete (e.g., see [ 6 , Section 6 ]). We say that 
sets S,T C are proportional, and write S' ~ T, if for some integer I with 
gcd(l,n) = 1, S = IT where the multiplication is taken over Z„. Obviously, 
S ~ T implies (S)„ — (T)„. For example (Si = {±2, ±10}, S 2 = {±3, ± 8 }, and 
n = 23), (Si)„ ~ (S 2 )n since Si ~ S 2 (with I = 16). Although it was conjectured 
by Adam [ 1 ], the inverse statement is not true as counterexamples exist for any 
values of n except some n of the form n = 2“3^m, where a £ {0, 1,2,3}, /3 G 
{0, 1,2}, gcd(m, 6 ) = 1 and m is squarefree. For example in 2ie, the 6 -element 
sets Si = {±1, ±2, ±7} and S 2 = {±1, ± 6 , ±7}, verify the isomorphism (Si)i 6 — 
(S 2 )i 6 but Si S 2 . However the simple isomorphism rule holds for important 
special cases (for example, circulant graphs with prime number of vertices [ 12 ] 
or with 4-element sets S). Under some additional restrictions, the isomorphism 
property of graphs can be replaced by the property of their isospectrality [ 22 ]. 
The relative independence of link length from delay time opens up the possibility 
of distinguishing among isomorphic networks on the basis of their algorithmic 
performance. A network that provides labelled edges should be able to exploit 
the same properties as one with different labelling if the graphs are isomorphic. 

2 Bisection of Circulant Graphs 

For any graph G = (V,T), a vertex bisector of G is a set of vertices V' C V 
such that the removal of the edges incident to the vertices of V splits G into two 
components G^ and G^ of the same size (that is, llU(G^)] — 1U(G^)1] < 1). G^ and 
G^ are called the two halves of the bisection. The vertex-bisection width vw(G) of 
G is defined as: vw(G) = min{jU'j such that V is a vertex bisector of G}. The 
edge-bisection width ew(G) of G is the minimum number of edges whose deletion 
yields two components G^ and G^ such that ]U(G^)1 = [fj and jU(G^)] = 
]"|] where n = jU(G)j. The problems are not equivalent: the complete graph 
has no vertex bisector, whilst it has an edge-bisection set of size ]"|]. Both 
problems are NP-complete, but lower and upper bounds are known for most 
of the regular topologies of networks (for example, see [20]). Upper bounds on 
the vertex-bisection width of Cayley graphs (with a generating set of cardinality 
r), given in [4] and improved in [9] in the relaxed case where jU(G^)j > |n 
and ]U(G^)1 > to: vw(G) < c(r)n^“^fo where c(r) is a constant depending 
only on r. In [15], it has been proved that an Abelian Cayley graph G can be 
separated into two equal parts by deleting less than In | vertices. 
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As circulant graphs are vertex-transitive, an upper bound of the edge- 
bisection width of any circulant graph of n vertices can be given from the parti- 
tioning of vertex set (cyclically labelled {0, . . . , n — 1}) into two halves: V (G^) = 
[0, . . . , [n/2j — 1] and V{G^) = [[n/2j, . . . , (n — 1)] or any rotation of such a cut, 
as shown in Figure 1, say V (G^) = [a, . . . , 6] and V (G^) = [&-I- 1, . . . , a— 1] where 
all operations are taken modulo n and 6 = a-|-[n/2j— 1 (mod n). Without 
loss of generality, let us label the nodes on the ring cyclically and clockwise, and 
let us assume 1 < Si < S 2 < . . . < Sr < n/2. 




Fig. 1. Bisection of a Circulant Graph. 



Lemma 1. The edge-bisection width of any circulant graph of degree 2r with n 
vertices and chord set S = {±si, . . . , is^} is at most 2(|si| -|- . . . -I- |sr|). 

Proof. Let us partition the vertex set {0, . . . , n — 1} into two disjoint sets with 
the same order (within one), as described above (see Figure 1). We count the 
number of chords of type Sj which are “cuP at a and b. All the positive chords 
(clockwise on the figure) outgoing from node b are cut. In particular, the largest 
positive chord of length k outgoing from node b is cut. In fact, the same 
type of chord Sr outgoing from nodes [b 1 — k, . . . ,b] is cut. Similarly, in the 
neighbouring of node a, all the nodes [a — fc, . . . , a — 1] have their outgoing 
chord Sr cut. No other node have their outgoing chord Sr cut (otherwise, it 
will require their Sr chords to be larger than k). Hence, the number of chords 
of type Sr, of length k = Sr, which need to be deleted to bisect the graph is: 

I [6 -I- 1 — fc, . . . , &] I -I- I [a — /c, . . . , a — 1] I = fc -I- fc = 2/c = 2sr- Similarly, the edge 
bisecting set includes 2sj of each type of chord Sj (of length ki = Si), and the 
lemma follows. Note that “negative” chords (i.e., {— si, — S 2 , ■ • ■ , — Sr}) have been 
already counted as they correspond to incoming edges while labelling clockwise. 

Similarly, we give an upper bound of the vertex-bisection width. 
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Lemma 2. The vertex-bisection width vw(G) of any circulant graph of degree 
2r with n vertices and chord set S = {±si, . . . ,±Sr} is at most 2maxi<i<j. |si|. 

Proof. Let us partition the vertex set {0, . . . ,n — 1} into three disjoint subsets 
of V: V , V{G~^) and such as V is a vertex bisector. Our proof is con- 

structive. Initially, set V = % and let y(G^) and V(G^) be the two sets of 
vertices obtained by an edge-bisection of G (as described in Lemma 1). We re- 
move the nodes b-\- 1, . . . ,b-\- k — 1 from V'^ and add them to V (and delete all 
incident edges accordingly). Similarly, remove the nodes a, . . . ,a k — 2 from 
t^(G^) and add them to V (and delete all incident edges accordingly). Any 
path between a node of V{G^) and a node of V{G‘^) must either include the 
chord Sr from node b to node b k, or include the chord Sr from node a — 1 
to node a k — 1. Indeed, the path can neither use a chord larger than k, nor 
use an intermediate node (as they are all in V' now). By adding nodes a — 1 
and 6 to V , and deleting all incident edges accordingly, we bisect G as desired. 
As we removed the same number of vertices in the original sets I^(G^) and 
I^(G^), it is easy to verify that they are of the same size (within one). Clearly, 
|W| = I , 6 -I- fc — 1] I -I- I [a — 1, . . . , a -I- A: — 2] I = fc -I- fc = 2fc = 2|sr |. 

Using Lemmas 1 and 2, the isomorphic rule and lattices reduction algo- 
rithms [21], we can find an appropriate representation in polynomial time. 

Corollary 1. For any circulant graph of degree 2r with n vertices and chord set 
S = {±si, . . . ,±Sr}.- 

r 

ew(G) < 2 min > |t,j and vw(G) < 2min max ItA 

> Tr^Sl<i<r 

i=l 

where the minima are taken over all sets T = {ti, , G} with T ^ S. 

Theorem 1. Let n = p be prime and let r = o(logp). For any circulant graph 
of degree 2r with p vertices and chord set S = {±Si, . . . , 

ew(G) < 

Proof. Let us consider the family of p points {Isi, . . . ,lsr),l G They all 
belong (after reduction modulo p) to the r-dimensional cube [0,p — 1]'’ with 
side length p. Let us consider the r dimensional octahedron O of diameter 2L 
centred at the origin which is defined as the set of points (a;i, . . . , Xr) G IG with 
\xi \ \xr\ < L. The volume of O is volG = Because O is convex, 

from Theorem 5.8 of [23] we derive that the cube [0,p— 1]'’ is covered by at most 

T 

— ^Iogr+O(loglog r) P . 

volG 

parallel translates of O for some L = Therefore 

there is at least one translate of O containing at least two points corresponding to 
some 0 < li < I 2 < p— 1- Therefore, putting I = h — h, we see that ^ 

and from Corollary 1 we obtain the desired result. 
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Remarks: (i) Using the known inequality r! < which holds for all inte- 

gers r > 7 we obtain that, provided that r is large enough: ew(G) < 

(ii) If r — >■ oo such that r = o(logp) then the bound of Theorem 1 becomes of 
the form ew(G) < (r/e-|- 

Theorem 2. Let n = p be prime. For any cireulant graph of degree 2r with p 
vertices and chord set S = {±si, . . . , ±5^}.' vw(G) < 4:p^~r, 

Proof. Let N = — 1. Separating this cube into N'" < p equal subcubes 

with the side length h = p/N, we see that there is at least one subcube that 
contains at least two points corresponding to some 0 < h < I 2 < p— 1- Therefore, 
as in proof of Theorem 1, putting I = I 2 — h, we obtain |?Si| < h,i = 1, . . . ,r. 
Because p is prime and 1 < ^ < p — 1, gcd(^,p) = 1 and, thus, Corollary 1 can be 
applied to the set IS. If p^/’’ < 2, the bound is trivial. If p^^’’ > 2 then we have 
N = [p^/’’] — 1 > 0.5p^/’’. Thus h < 2p^“^/’’ and the result follows. 

Remark. This is a improvement of a factor in(w/ 2 ) ^j^g j^gg^ 

known bound, and if r = o(logp) then vw(G) < (2 -|- o(l))p^“^. 

Composite values of n. It would be natural to try to extend our method to 
composite values of n. This does not seem to be possible, as if n = 2m is even, 
and 5'o = {±l,±(m-|- 1)} then min 7 ’,.^ 5 p maxj^i 2 \ti\ > n/4. Indeed, let T = IS. 
If \l\ = |ti| < m/2 = n/4:, then, because the condition gcd(^,n) = 1 implies 
that I is odd, we have |t 2 | = m + I > m/2 = n/4. Thus any methods using our 
Lemmas 1 and 2 will lead to very weak results. 

The difficulty of finding a precise bisection width when n is composite is not 
really surprising. There is nothing new in the fact that the arithmetic structure 
of n (e.g., primality) plays an important role. Let us recall that the simple 
isomorphism rule only holds for special cases, including circulant graphs with 
prime number of vertices [12], but is not true for most of values of n (see previous 
section) . More than 35 years after being conjectured [1] , there is still some cases 
for which it is unknown if it holds. In composite cases, the intricate relationship 
between chords requires a particular method for each case. 

3 Gossiping in Circnlant Graphs 

Information dissemination is the most important communication problem in 
interconnection networks. Three basic communication problems are: Broadcast 
(one-to-all) : one node has a piece of information and has to communicate this 
information to all the other nodes; Accumulation (all-to-one): all the nodes have 
a different piece of information and want to communicate this information to the 
same particular node; Gossip (all-to-all): each node has a piece of information 
and wants to communicate this information to all the other nodes (such that all 
nodes learn the cumulative message). 

Gossiping is the communication problem where each node of a network has a 
piece of information and wants to communicate this information to all the other 
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nodes (such that all nodes learn the cumulative message). A communication 
algorithm consists of a number of communication rounds during which nodes 
are involved in communications. Let g{G) denote the number of rounds of the 
optimal gossip algorithm for G. The communication algorithm necessary to solve 
this problem depends on the communication model. 

Several communication modes exist. The vertex- disjoint paths mode (VDP) 
assumes: (i) a communication involves exactly two nodes which can be can be at 
distance more than 1, (ii) any two paths corresponding to simultaneous commu- 
nications must be vertex-disjoint. Similarly, the line mode or edge-disjoint paths 
mode (EDP) assumes: (i) a communication involves exactly two nodes which 
can be can be at distance more than 1, (ii) any two paths corresponding to si- 
multaneous communications must be edge-disjoint. The mode of communication 
also depends on the type of communication links available: (a) half-duplex (or 
1-way) or (b) full-duplex mode (or 2-way). In the 2-VDP mode (resp., 2-EDP), 
two nodes involved in a 2- way VDP communication (resp., EDP communica- 
tion) can exchange their information. In the I- VDP mode (resp., I-EDP), the 
information will flow in the 1-way direction from one node to the other. 

3.1 Bisection Lower Bounds 

In this paper, we give lower bounds of the gossip complexity for the circulant 
graphs of I V| = n = p vertices with p prime. A direct relationship exists between 
the bisection width and the gossip complexity. Let i) = (log(l -I- 5^/^) — 1)“^ = 
1 .440 .... The following statement gives a summary of several known bounds 
from [18] (for the EDP mode) and from [19] (for the VDP mode): 

Lemma 3. Let G he a network of edge-bisection ew(G) and of vertex-bisection 
vw(G). 

— In the 2-EDP mode, g{G) > 21ogn — logew(G) — log log n — 4. 

— In the 2-VDP mode, g{G) > 21ogn — logvw(G) — loglogvw(G) — 6. 

— In the 1-VDP mode, for any “well-structured” gossip algorithm, 
g-w{G) > 2 log n — (2 — ■d)(logvw(G) -I- loglogvw(G)) — 15. 

By using our Theorems 1 and 2, we obtain the following lower bounds. 

Theorem 3. Let n = p he prime. For any gossip algorithm running on a cir- 
culant graph of degree 2r with p vertices and chord set S = {±si, . . . , ±Sr}; the 
number of rounds is at least: 

— In the 2-EDP mode, provided thatr = o(logp), g{G) > (1-1-^) logp-l-o(logp). 

— In the 2-VDP mode, g{G) > (1 -I- y) logp — loglogp — 8. 

— In the 1-VDP mode, provided that r = o(logp), 
g^G) > (r? + ^) logp -{2-d) loglogp - 17. 

Of course, if r = o(logp) then, in the 2-VDP mode, the slightly sharper 
bound g{G) > (1 -I- y)logp— (log log p -I- log r -|- 5) holds. 
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3.2 Generic 2-VDP Gossiping in Circulant Graphs 

It is clear that “specialised” algorithms must be used to obtain tight complexity 
bounds for specific chord sets. (For example, it is easy to see that a circulant 
graph of 2” nodes with chord set S = {±1, ±2, ±2^, . . . , ±2'’“^} perfectly embeds 
a hypercube, and thus, can gossip with the minimum number of rounds.) 

In this paper, we only focus on “generic” algorithms that works correctly for 
any circulant graph given as input. Although, we can only bound the number 
of rounds required by the algorithm when n is prime, let us emphasize that the 
algorithms described below completes the gossip correctly even if n is composite. 

A popular strategy, introduced in [16], to solve the gossip problem is to use a 
3-phase algorithm described as follows. Let G{V,E) be the graph corresponding 
to the topology of the network. Let a{G) be any subset of nodes of G (called 
the accumulation set). Divide G into m = |a(G)| connected components (called 
accumulation components) of size |"n/m], such that each connected component 
contains exactly one accumulation node of a{G). The 3-phase gossip algorithm 
for G with respect to a{G) follows the phases, (where l<i<m):(l) Ac- 
cumulation: each accumulation node Ui accumulates the information from the 
nodes lying in its accumulation component Ap, (2) Gossip performs a gossip al- 
gorithm among the nodes at of a(G); (3) Broadcast: each node at broadcasts the 
cumulative message in its component A^. 

For the 2-VDP mode, the accumulation problem can be considered as the re- 
verse of broadcast problem, and hence, a similar strategy can be used. In [16], the 
authors proved that, with the 2-VDP mode, broadcasting (resp. accumulating) 
in an Hamiltonian path of k nodes can be done in [log fc] rounds. By analogy to 
notations in the 1-VDP mode [17], we call a 3-phase algorithm well- structured 
if the gossip phase is an implementation of an optimal gossip algorithm (for ex- 
ample, in the complete graph Km, or a hypercube Q[iogml)) Etnd takes [logm]. 

With the 2-VDP mode a well-structured gossip algorithm only requires 
[logn/m] -I- [logm] -I- [logn/m] < 2[logn] — [logm] rounds, when the accu- 
mulating components are Hamiltonian paths. An upper bound on g{G) can be 
obtained by constructing a well-structured gossip algorithm g.^ by taking the m 
nodes of the vertex bisector as accumulation nodes. It is easy to see that, when 
m = , the number of rounds is nearly optimal: gw{G) < (1 -I- y) logn. 

In the following, we will exploit the knowledge of the existence of such m 
nodes when n is prime by computing (in polynomial time) a bisector set of 
size m (as described in Theorem 2). We first describe a well-structured gossip 
algorithm for a specific infinite family of circulant graphs of degree four with 
n prime. We then show how to extend this strategy to other circulant graphs. 
Let us consider the circulant graph G of degree 4 with n vertices and chord set 
S = {±1,±S2} (that is, r = 2) with S 2 = 2^* and S 2 (s 2 — 1) < n < s| (that 
is, S 2 ~ n^/^). Without loss of generality we can label the nodes [0, . . . , (n — 1)] 
cyclically along the chord -1-1. We partition G into m = $2 connected components 
of, up to, [n/s 2 ] < S 2 = 2*^ nodes: 

Ai = [i,S 2 -\- i,2s2-\- i, ■ ■ ■ ,{s 2 — l)s 2 i], 0 < i < n — (s2 — l)s2, 

Ai = [i,S2-ki,2s2-ki,...,{s2-2)s2-ki], n - {S2~1)S2 <i < S2 -1. ^ 
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where each component has an accumulating node ai = i, Q < i < (s2 ~ !)• This 
also defines S2 consecutive segments 5'^, 0 < fc < (§2 — 1) along the +1 chords. 
The first segment, Sq, corresponds to the accumulation nodes, and, all segments, 
but possibly the last one are of size S 2 . Clearly, each accumulating com- 

ponent is connected by an Hamiltonian path along the chord -|-S 2 - With the 
2-VDP mode, the accumulating phase and the broadcasting phase take at most 
log S 2 = d < log + 1=5 log n -I- 1 rounds respectively. 

We say that the nodes i and j exchange information through the segment 
Sk, if the information is passed first through k chords (-I-S 2 ), then through the 
chord (± 1 ) between the nodes ks2 + i and the node ks2 + j (in the segment Sk 
of nodes [fcs 2 , . . . , ks 2 + (s 2 — 1)]), and finally back through k chords (— S 2 ). For 
the gossip phase of the accumulating nodes, we present the recursive Algorithm 
(see also Figure 2) similar to the algorithm presented in [17] for square grids Gr^ 
of n = 2^"^ nodes and side-length = 2‘^. Initially, the algorithm is started by 
running Gossip(0, S 2 — 1). 

Procedure Gossip(o, 6) 

if (b — a) > 1 do in parallel 

Gossip(a, a -I- L^^J) and Gossip (a -I- 
endo in parallel 

for a < i < a + do in parallel 

exchange information between i and j = b — i throngh segment Sk, k = 
endo in parallel 










0 1 2 3 4 5 6 7 






// 



Fig. 2. The communication paths in the 2-VDP mode. 



By induction, it is easy to see that each node learns the cumulative message 

and the communication at each round is vertex-disjoint. Any two pairs of nodes 

{i,j) and (i' ,j') exchanging information through a segment k define two non- 

./ ./ 

intersecting sections in the segment Sk, as k = [^J = have i < j < 

i' < /. In the case i < i' < j' < j, we have = k ^ k' = ['h^J . 

Obviously, log S 2 < | log n -I- 1 rounds are sufficient to gossip among the 
accumulating nodes, and thus, the total number of rounds of this well-structured 
gossip algorithm is gw{G) < (1-1- |) logn -I- 3 = | logn -I- 3. 

This is near optimal as it almost match the lower bound introduced in The- 
orem 3 and can be generalised to other cases. Note that the last section S'sj-i 
is not used and does not require to be full. In fact, only the first k segments, 
k = [^^J, are used by the algorithm. Hence the algorithm can be adapted to 
run correctly for < n < s|, and takes either | logn or | logn -I- 3 rounds. 
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In the case that S 2 is not a power of 2 (i.e., 2‘^ < S 2 < 2*^+^ for some d), 
a similar algorithm can be used with two extra rounds. After the accumulation 
phase, each accumulation node j, 2^^ < j < S 2 — 1, gossips its information to the 
accumulation node i, 2*^ — 1 > i > 2*^ — 1 — (s 2 — 1 — 2*^), through the segment Sk, 
k = {j — i)l2. After the gossip phase Gossip(0, 2'^ — 1), and before the broadcast 
phase, each node i sends the cumulative message to the respective node j. 

When considering circulant graphs with n prime and chord set S = 
{±Si,±S 2 }, Si < S 2 , (instead of the specific chord set S = {±1,±S2}), The- 
orem 2 proves that for circulant graphs G of degree 4 with n vertices, where 
n = p is prime, it is possible to generate a representation of G with a chord set 
T = {±ti,±t 2 } such that {S)„ ~ (T)„ and t\ < t 2 < 4n^“5 = 4n^/^. Using T 
as the chord set, and if gcd(ti, ^ 2 ) = 1, we can partition G into m = \n/t 2 '\ con- 
nected components (along the chord -|-ti) of, up to, ^2 nodes. Each component 
has an accumulating node ai = it 2 , 0 < i < (m — 1). Clearly, each accumulat- 
ing component is connected by an Hamiltonian path along the chord +ti. With 
the 2-VDP mode, the accumulating phase and the broadcasting phase take at 
most [logt 2 l- Representing G with chord set W = {±1,±W2} where IU 2 = 

(mod n), the accumulating nodes aj,0<i<m — 1, are now consecutive along 
the chord -1-1. It is easy to see that if 2nfw2 < m < IV 2 , the generic Gossip 
algorithm will run correctly within a number of rounds close to the minimum. 

With chord set S = {±1, . . . , r > 3, we can give 

tight upper bounds on the gossiping complexity in the 2-VDP mode using an 
algorithm similar to [17] for r-dimensional grids Gr(] of n nodes and side-length 
^ r > 3, when n is an rth power, and extend from these algorithms. 

4 Concluding Remarks 

We introduced a novel approach to compute (in polynomial time) a convenient 
isomorphic representation of a circulant graph to exploit the knowledge of its 
bisector. This method is general and may be of use in other applications. As an 
example, we introduced a “generic” algorithm that gossip efficiently in circulant 
graphs of prime order. Although “specialised” algorithms should be used for 
specific chord sets to obtain tight bounds, our algorithms work in all cases and 
are nearly optimal in important cases. We showed that in composite cases, the 
intricate relationship between chords requires a particular method for each case. 

Let us note that the problem of finding I for which T = IS minimizes one 
of the expressions of Corollary 1 is an instance of the famous shortest vector 
problem for metrics Coo and Ci. Thus one can use the variety of the algorithms 
available for this problem, e.g. [2,5,21] in order to find an optimal value of ? or a 
value which gives almost optimal results. Typically these algorithms target the 
metric £2 but some can apparently be adjusted to the metrics Coo and Ci and 
in any case they can be used directly for approximate solutions if one uses that: 

ELi l^il < and maxi<,<^ [£[ < (EEi 

Unfortunately, when r is growing, all known algorithms either have exponential 
time or output (in polynomial time) a vector which is an exponential factor 
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longer than the shortest vector. However for fixed r or slowly growing with p (as 
about log log p) these algorithms are polynomial in logp. 
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Abstract. We study the rendezvous search problem for k > 2 mobile 
agents in an n node ring. Rather than using randomized algorithms or dif- 
ferent deterministic algorithms to break the symmetry that often arises 
in this problem, we investigate how the mobile agents can use identical 
stationary tokens to break symmetry and solve the rendezvous problem. 
After deriving the conditions under which identical stationary tokens 
can be used to break symmetry, we present several solutions to the ren- 
dezvous search problem. We derive the lower bounds of the memory re- 
quired for mobile agent rendezvous and discuss the relationship between 
rendezvous and leader election for mobile agents. 



1 Introduction 

In the mobile agent rendezvous search problem, k mobile agents located on an 
n node network are required to meet or rendezvous. When the mobile agents 
or the network nodes are uniquely numbered, solving the rendezvous search 
problem is trivial. When the mobile agents are identical and the network nodes 
are anonymous, however, the resulting symmetry can make the problem difficult 
to solve. 

Symmetry in the rendezvous search problem is typically broken by using ran- 
domized algorithms or different deterministic algorithms [2]. While Baston and 
Gal [5] mark the starting points of the searchers, they still rely on randomization 
or different deterministic algorithms. Kranakis et al [7], however, studied how 
two searchers, i.e., mobile agents, on an n node ring can use identical tokens to 
break the symmetry. 

Most of the literature on the rendezvous search problem deals with the case 
of fc = 2 searchers. The few exceptions include Lim, Beck, and Alpern [8], 
Alpern [1], Pikounis and Thomas [9], and Gal [6], but that research focuses 
almost exclusively on the line. In this paper, we investigate the mobile agent 
rendezvous search problem for k > 2 mobile agents on an n node ring, where 
the mobile agents use tokens to break symmetry. 
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1.1 The Network Model 

The model consists of k > 2 identical mobile agents that are located on separate 
nodes of an anonymous, synchronous n node ring. The mobile agents may or may 
not share a common orientation, i.e., agree on the direction that is clockwise. A 
given node requires only enough memory to host a token and at most k mobile 
agents. Each mobile agent, MA, owns a single identical stationary token, i.e., 
the tokens are indistinguishable and once they are placed on a node, they must 
remain in place. A token or MA at a given node is visible to all MAs on the same 
node, but is not visible to any other MAs. When a MA is visible, its state is also 
visible. The MAs follow the same deterministic algorithm and begin execution 
of that algorithm at the same time. 

Memory permitting, a MA can count such things as the number of nodes 
visited, the number of tokens discovered, the number of MAs discovered, the 
number of nodes between tokens, or the total number of nodes in the network. 
In addition, a MA might already know the number of nodes in the network 
or some other network parameter and requires sufficient memory to store this 
information. Since the MAs are identical, they face the same limitations on 
their knowledge of the network. Rendezvous occurs when all the MAs meet on 
a network node. 

An instance of the mobile agent rendezvous problem is solvable when the 
mobile agents 

1. can correctly determine whether or not rendezvous is possible and then 

2. rendezvous or stop as appropriate. 

Solving the mobile agent rendezvous problem involves making the correct 
choice, i.e., stopping if rendezvous is impossible and achieving rendezvous if 
possible. 

In this paper, we assume that the MAs always place their tokens on their 
respective starting nodes in the first step of any algorithm they execute. The 
tokens are identical and stationary so that intertoken distances that exist at the 
beginning of any algorithm persist throughout the algorithm. 



1.2 Our Contribution 

In this paper, we continue the study of the mobile agent rendezvous search prob- 
lem in the ring [7]. Our model consists of k>2 identical MAs in an anonymous, 
synchronous, and possibly oriented n node ring. Rather than using random- 
ized algorithms or different deterministic algorithms, we use identical stationary 
tokens to break symmetry so that the MAs can run the same deterministic 
algorithm. 

First, we prove that if neither k, the number of MAs, nor n, the number of 
nodes in the ring, are known then the rendezvous problem is unsolvable. For the 
remainder of the paper, we assume that k is known. Next, we prove that when 
k is known, rendezvous is possible if and only if S, the sequence of intertoken 
distances, is aperiodic. 
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We present three algorithms that solve the rendezvous problem when k is 
known. Either k or n must be known so when n is known, k can be determined 
in one traversal of the ring. When S, the sequence of intertoken distances between 
the MAs, is aperiodic, the algorithms guarantee that rendezvous occurs. How- 
ever, if S is periodic and thus rendezvous is impossible, the algorithms guarantee 
that the MAs stop. The memory and time complexities for these algorithms are 
presented in table 1. We also present three algorithms that solve the rendezvous 
problem for values of k and n that satisfy various conditions such as primality. 
The memory and time complexities for these algorithms, numbered 4 through 
6, are also presented in table 1. 



Table 1. The Rendezvous Search Problem with k > 2 Mobile Agents 



Algorithm 


Memory 


Time 


1 


0(fclg n) 


0(n) 


2 


O(lgn) 


0{kn) 


3 


0(fclg Ign) 


Of "‘S"") 


4 


O(lgn) 


0(n) 


5 


O(lgfc) 


0{n Ig k) 


6 


O(lgfc) 


0(n) 



Kranakis et al [7] proved that solving the rendezvous problem with k = 
2 mobile agents requires at least O(lglgn) memory. In this paper, we prove 
that solving the rendezvous problem with k > 2 mobile agents requires at least 
I2(lg Ig n -b Ig k) memory. 

Finally, we prove that if the MAs share a common orientation, then the 
rendezvous problem and the leader election problem for MAs are equivalent. 
A solution to the first problem can be used to derive a solution for the second 
problem and vice versa. If the MAs do not share a common orientation, however, 
then the leader election problem for MAs is strictly more complex than the 
rendezvous problem since a solution to the latter problem does not always imply 
a solution to the former problem. 



1.3 Outline of the Paper 

In section 2, we present the impossibility results for the MA rendezvous search 
problem with k > 2 MAs. In section 3, we prove that rendezvous is possible if and 
only if the sequence of intertoken distances is aperiodic. We present unconditional 
solutions for rendezvous in section 4 and derive the lower bounds on memory 
necessary for rendezvous in section 5. In section 6, we present solutions to the 
rendezvous problem for specific circumstances and then, in section 7, we discuss 
the relationship between leader election and rendezvous for mobile agents. 
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2 Impossibility Results 

To solve the rendezvous problem, mobile agents must recognize when rendezvous 
is possible. As the next theorem shows, knowledge of fc or n is a necessary 
condition for the rendezvous of identical MAs in an anonymous, synchronous 
ring. 

Theorem 1. When each MA in the ring knows neither n, the number of nodes, 
nor k, the number of mobile agents, the mobile agent rendezvous search problem 
is unsolvable. 

Theorem 1 indicates that knowing either fc or n is a necessary condition for 
solving the rendezvous problem. For the remainder of the paper, we shall assume 
that k is known. 



3 Solving the Rendezvous Problem 

As stated in section 1.1, we assume that the MAs always place their tokens on 
their respective starting nodes in the first step of any algorithm they execute. The 
tokens are identical and stationary so that the intertoken distances that exist at 
the beginning of an algorithm persist throughout the algorithm. Since the MAs 
are identical and run the same deterministic algorithm in an anonymous ring, 
rendezvous can only occur if the intertoken distances can be used to break the 
resulting symmetry. 

Theorem 2. Rendezvous is guaranteed if and only if S, the sequence of inter- 
token distances, is aperiodic. 

4 Unconditional Solutions 

Given the results of section 3, the mobile agent rendezvous problem can be solved 
when the MAs can determine that S, the sequence of intertoken distances, is 
aperiodic. 

First, assume that each MA has O(fclogn) memory. The MAs know k but 
do not necessarily know n or share a common orientation. Consider the following 
algorithm. 

Algorithm 1 

1. Release the token at the starting node. 

2. Choose a direction and begin to walk around the ring. 

3. Compute the k intertoken distances di, ... ,dk- 

4. If 5 = di, . . . ,dk is periodic, then stop. (Rendezvous is not guaran- 

teed.) 

5. Set forward = d\, . . . ,dk and reverse = dk, . ■ . ,d\. 

6. Let lexi(someSequence) denote the lexicographically maximum rota- 

tion of the sequence someSequence. 

7. Set forward = lexi(forward) and reverse = lexi(reverse) . 
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8. If forward and reverse differ, then 

i) determine which of these sequences is the lexicographic maximum 
and rendezvous at the node where this sequence starts. 

ii) else let MAi and MAj denote the MAs at the beginning 
of forward and reverse respectively. 

9. If MAi and MAj are the same MA, then rendezvous at the node 
where MAi resides. 

10. If MAi and MAj are distinct MAs, then look at the two paths 
between MAi and MAj in the ring. 

i) If only one of the paths had an odd number of nodes, 
then rendezvous at the node in the midpoint of that path. 

ii) If both paths have an odd number of nodes, then 

a) if the paths differ in length, rendezvous at the 
midpoint of the shorter path, 

b) else compare the sequences of intertoken distances for 
the two paths and rendezvous at the node in the midpoint 
of the path that is the lexicographic maximum. 

iii) If both paths have an even number of nodes, then 
rendezvous at the node in the midpoint of the path that 
contains an odd number of MAs. 

Theorem 3. If each MA has memory O(fclgn), then the mobile agent ren- 
dezvous problem can be solved in time 0{n) . 

If the MAs are restricted to memory O(lgn), the mobile agent rendezvous prob- 
lem is still solvable. 

Consider the following algorithm. In each round, a MA may become inactive 
and thus spend the rest of the algorithm at its starting node. If the MAs share 
a common orientation, then the MAs travel in the same direction, so an active 
MA can identify an inactive MA because the former will find the latter stopped. 
If the MAs do not share a common orientation, however, an active MA that 
meets another MA can not tell if the latter MA is inactive or merely travelling 
in the opposite direction. In this case, each MA needs to set a bit that indicates 
whether it is active or inactive. 

With only O(lgn) memory, a, M A needs more than one traversal of the ring 
to determine if S is aperiodic. Consider the following algorithm. 

Algorithm 2 

1. Release the token at the starting node. 

2. Set c = l.(The number of the current round.) 

3. Set active = 1. (A bit to indicate whether the MA is active.) 

4. Set inactive = 0. (Count the number of inactive MA.) 

5. Choose a direction and begin to walk around the ring. 

6. Increment inactive each time an inactive MA is met. 

7. Compute the distance to the cth token, dc, i.e., if c = 1, count the 
distance to the first token and if c = 2, count the distance to the 
second token, etc. 
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8. Continue to walk around the ring and compare dc to each intertoken 

distance between c tokens. 

9. If MA sees an intertoken distance di such that d, > dc, then the 
MA continues in the same direction and becomes inactive, i.e., sets 
active = 0, when it reaches its starting node. 

10. If MA did not see an intertoken distance di such that di > dc, then 
the MA remains active when it returns to its starting node. 

11. If only one MA remains active, i.e., inactive = k — 1, then walk 
around the ring and arrange the rendezvous. (A MA that reaches 
this point is the sole active MA.) 

12. If c == 2 and inactive == 0, then stop. (All the intertoken distances 
are equal and thus rendezvous is impossible.) 

13. Else set c = c + 1 and inactive = 0. 

14. Repeat from step 5. 

Theorem 4. If each MA has memory O(lgn), then the mobile agent rendezvous 
problem is solvable in time 0{kn). 

Algorithms 1 and 2 solve the mobile agent rendezvous problem when each 
MA has memory 0{k\gn) and O(lgn) respectively. It is also possible, how- 
ever, to solve the mobile agent rendezvous problem when each MA has memory 
0{k\glgn). 

Let pi,...,pr denote the first r prime numbers such that Y[i=iPi > 
active MA needs to recognize if another MA is active. Without a common 
orientation, each MA needs to set a bit to indicate whether it is active. 

Algorithm 3 

1. Release token at the starting node. 

2. Set active = 1. 

3. Set Pr to the first prime such that ni -_iPi > n. 

4. Set Pi = Pi = 2. 

5. Set a = k, the number of active MAs. 

6. Walk around the ring and compute the intertoken distances mod pi 

between the a active MAs, i.e., d\, ..., da mod pi. 

7. Set forward = d\, ...,da mod Pi. 

8. Set reverse = da, - ■■ ,d\ mod pi- 

9. li forward is periodic, i.e., forward = {di, ...,da/a)°' eaodpi, then 

i) if at start of a block {di, ..., da/a), remain active. 

ii) else set active = 0. 

iii) if Pi < Pr, then 

a) set Pi = Pi+i, ex = a, and repeat from step 6. 

b) else stop, since rendezvous is impossible. 

10. If forward is aperiodic, then let lexi(someSequence) denote the lexi- 
cographically maximum rotation of the sequence someSequence. 

11. Follow steps 7 through 10 of Algorithm 1. 

The following lemma is necessary for the proof of Theorem 5 below. 
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Lemma 1. Consider a prime p such that 2 < p < r. Assume that for all primes 
Pi < p, the sequence of distances mod pi between the ai active MAs is periodic 
such that di,...,don = (di, . . . , mod and a \ a^. Let the first MA 

in each occurence of the block d\, . . . jd^^/a remain active, while the remaining 
MAs become inactive. If the sequence of distances mod p between the a active 
MAs is periodic, then the original intertoken distances can be partitioned into 
t I k equal length blocks with sums cti, CT 2 , . • . , Ct that are congruent modulo all 
the primes pi,p 2 , ■ ■ ■ ,p- 



Theorem 5. If each MA has memory 0{klglgn), then the rendezvous problem 
can be solved in time 0( igi|” )■ 

Proof of Theorem 5. 

Algorithm 3 solves the rendezvous if it stops the MAs when rendezvous is im- 
possible and otherwise ensures a rendezvous. Suppose that for all pi < Pr, the 
sequence of distances mod Pi between the active MAs is periodic. Algorithm 3 
will stop the MAs in step 8 and indicate that rendezvous is impossible. Lemma 1 
implies that in the last round of Algorithm 3, where p = Pr, the original inter- 
token distances can be partitioned into | k equal length blocks with sums 
cr[, i = 1, . . . ,Or, that are congruent mod p for all p < Pr- The Chinese Re- 
mainder Theorem then implies that the sums cr’', i = 1,. . . ,ar, are congruent 
modOi^iPi- Since 01=1 P* > then the original intertoken distances can be 
partitioned | k into equal length blocks with sums af that are equal for all 
i, i = 1, . . . jQr- This implies, however, that n = arcl for any i and thus Ur \ n. 
Since gcd(fc,n) = g > \, S is periodic, and the algorithm 3 correctly stops the 
MAs in step 8. 

Algorithm 3 must also guarantee that rendezvous occurs when possible, i.e., 
when S is aperiodic. Suppose not, i.e., S is aperiodic but rendezvous does not 
occur. This implies that for all pi < Pr, the sequences calculated in step 7 of 
Algorithm 3 are periodic and thus all rounds of the algorithm will be executed. 
In the final round, where pi == Pr, algorithm 3 will stop the MAs and indicate 
that rendezvous is impossible. By the Chinese Remainder Theorem, however, 
this implies that S is periodic and thus contradicts the fact that S is aperiodic. 
Thus algorithm 3 solves the mobile agent rendezvous problem. 

If each mobile agent has memory O(fclglgn), then Algorithm 3 correctly 
determines whether rendezvous is possible and instructs the MAs to stop or 
rendezvous as appropriate. In the worst case, rendezvous is impossible and the 
MAs must complete all r rounds of Algorithm 3, where r is the smallest number 
of prime numbers such that Yli=iPi > Each of the r rounds takes n steps 
so the time complexity is 0{rn). Kranakis et al [7] proves that r e Q( ig^"„ )i 

so the time complexity of Algorithm 3 is ). This completes the proof of 

Theorem 5. ■ 
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5 Lower Bound on Memory 

In Theorem 1 of section 2, we prove that the mobile agent rendezvous problem 
can only be solved if either fc or n is known. When k = 2, Kranakis et al [7] 
prove that f?(lglgn) memory is required to solve the rendezvous problem when 
k is known. 

Theorem 6. Solving the mobile agent rendezvous problem for k > 2 MAs re- 
quires I7(lglg n + Ig A:) memory. 

6 Conditional Solutions 

In the preceding two sections, we proved that the mobile agent rendezvous prob- 
lem is unconditionally solvable if each MA has memory 17 (Ig Ig n -I- Ig fc). In this 
section, however, we explore conditional solutions, i.e., solutions for cases where 
k and n satisfy various conditions such as primality. A network designer or ad- 
ministrator may be able to choose the values of k and n so as to meet these 
conditions. 

The mobile agent rendezvous problem can be solved correctly whenever the 
sequences Si are aperiodic for all i, e.g., n is prime or n is the product of two 
or more primes larger than k. A network designer or administrator, however, 
probably cannot directly dictate that Si is aperiodic for all i but they are able 
to choose k and n. If gcd(/c', n) = 1, \/k' < k, e.g., n is prime or is the product of 
two primes greater than k, then Si is aperiodic for all i. The following algorithm 
assumes an oriented ring. An active token in unoccupied while an inactive token 
has a, M A residing on it. 

Algorithm 4 

1. Release the token at the starting node. 

2. Set active = 1. 

3. Set count = 0. 

4. Begin to walk around the ring in the clockwise direction. 

5. Compute the intertoken distances to the next three active tokens, i.e., 

di,d 2 ,ds, and increment count for each inactive token passed. 

6. If count == k — 1, arrange rendezvous. (Only active MA remaining.) 

7. If c ?2 > di and d 2 > ds, then remain active. 

8. Else become inactive, i.e., set active = 0, continue in current direction 

to starting node, and wait for further instructions. 

9. Repeat from step 3. 

Theorem 7. When the MAs share a common orientation and gcd(/c',n) = 1, 
Vfc' < k, then the mobile agent rendezvous problem can be solved with O(lgn) 
memory and 0{n) time in an oriented ring. 

When k is prime, the ring is oriented, and gcd(A:',n) = 1, 'ik' < k, then a 
variation of algorithm 4 solves the mobile agent rendezvous problem with 0(lg k) 
memory and 0(n) time. 
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Algorithm 5 

1. Release the token at the starting node. 

2. Set active = 1. 

3. Begin to walk around the ring in a clockwise direction. 

4. Execute round 1 of algorithm 4 but calculate the intertoken distances 

mod k. (All MAs will return to their starting nodes and those that 
became inactive have set active = 0.) 

5. (Now execute algorithm 4 as if on a ring of size k). The distances 
of interest are now the number of inhabited tokens between pairs of 
empty tokens.) 

6. Compute the number of inhabited tokens, i.e., tokens hosting inactive 

MAs, met on the path to the next three uninhabited tokens, i.e., 
mi, m 2 , m 3 . 

7. If mi == k — 1, arrange the rendezvous. (Only one active MA left.) 

8. If m2 > mi and m2 > m3, then remain active. 

9. Else become inactive, i.e., set active = 0, continue in current direction 

to starting node, and wait for further instructions. 

10. Repeat from step 5. 

Theorem 8. When the MAs share a common orientation, k is prime, and 
gcd(fc',n) = 1, Vfc' < k, the mobile agent rendezvous problem can be solved 
with 0 {logk) memory and 0 {n) time. 

The following algorithm solves the rendezvous problem when gcd(fc',n) = 1, 
V/c' < k, but k is not prime. 

Algorithm 6 

1. Release the token at the starting node. 

2. Set active = 1 and count = 0. 

3. Begin to walk around the ring in the clockwise direction. 

4. Compute the intertoken distances mod k to the next three active 
tokens, i.e., di,d 2 ,ds mod k, and increment count for each inactive 
token passed. 

5. If count == k — 1, arrange rendezvous. (Only active MA remaining.) 

6. If c?2 > di mod k and ^2 > ^3 mod k, then remain active. 

7. Else become inactive, i.e., set active = 0, and wait for further instruc- 

tions. 

8. Repeat from step 4. 

Theorem 9. When the MAs share a common orientation and gcd{k' ,n) = 1, 
Vfc' < k, then the mobile agent rendezvous problem can be solved with 0(log k) 
memory and 0 {nlogk) time. 

7 Leader Election and Rendezvons 

The relationship between the rendezvous problem and the leader election prob- 
lem among the k MAs depends on whether the MAs share a common orientation. 
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Theorem 10. If the MAs share a common orientation, then the leader election 
problem among k MAs is equivalent to the rendezvous problem for those MAs. If 
the MAs do not share a common orientation, however, then the leader election 
problem is strictly more complex than the rendezvous problem. 

8 Conclusion 

After proving that the mobile agent rendezvous search problem is unsolvable 
when both k and n are unknown, we prove that rendezvous is possible if and 
only if the sequence of intertoken distances is aperiodic. We then present un- 
conditional and conditional solutions for the rendezvous problem. We derive the 
lower bounds on the memory required for mobile agent rendezvous and then dis- 
cuss the relationship between rendezvous and leader election for mobile agents. 

In future research, it would be interesting to study how changes in the model 
affect the complexity of the mobile agent rendezvous search problem. For exam- 
ple, it would be interesting to study a network topology that differs from the 
ring or the case where each mobile agent has more than one token. 
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Abstract. Time synchronization is necessary in many distributed systems, but 
achieving synchronization in sensornets, which combine stringent precision re- 
quirements with severe resource constraints, is particularly challenging. This chal- 
lenge has been met by the recent Reference-Broadcast Synchronization (RBS) 
proposal, which provides on-demand pairwise synchronization with low overhead 
and high precision. In this paper we introduce a model of the basic RBS synchro- 
nization paradigm. Within the context of this model we characterize the optimally 
precise clock synchronization algorithm and establish its global consistency. In the 
course of this analysis we point out unexpected connections between optimal clock 
synchronization, random walks, and resistive networks, and present a polynomial- 
time approximation scheme for the problem of calculating the effective resistance 
in a network based on min-cost flow. We also sketch a polynomial-time algorithm 
for finding a schedule of data acquisition giving the optimal trade-off between 
energy consumption and precision of clock synchronization. We also discuss syn- 
chronization in the presence of clock skews. In ongoing work we are adapting 
our synchronization algorithm for execution in a network of seismic sensors that 
requires global clock consistency. 



1 Introduction 

Many traditional distributed systems employ time synchronization to improve the con- 
sistency of data and the correctness of algorithms. Time synchronization plays an even 
more central role in sensornets, whose deeply distributed nature necessitates fine-grained 
coordination among nodes. Precise time synchronization is needed for a variety of sen- 
sornet tasks such as sensor data fusion, TDMA scheduling, localization, coordinated 
actuation, and power-saving duty cycling. Some of these tasks require synchronization 
precision measured in /isecs, which is far more stringent than the precision required in 
traditional distributed systems. Moreover, the severe power limitations endemic in sen- 
sornets constrain the resources they can devote to synchronization. Thus, sensornet time 
synchronization must be both more precise, and more energy-frugal, than traditional 
time synchronization methods. 

The recent Reference-Broadcast Synchronization (RBS) design meets these two ex- 
acting objectives by producing on-demand pairwise synchronization with low overhead 
and high precision [7]. RBS is specifically designed for sensornet contexts in which 
(1) communications are locally broadcast, (2) the maximum speed-of-light delay be- 
tween sender and receiver is small compared to the desired synchronization precision, 
and (3) the delays between time-stamping and sending a packet are significantly more 
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variable than the delays between receipt and time-stamping a packet (so estimates of 
when a packet is sent are far noisier than estimates of when it is received). See [7] for 
a much fuller discussion of this last point, but measurements described therein suggest 
that the receiving delays can be reasonably modeled as a Gaussian centered around 
some mean, with the mean being the same for all nodes (assuming they share the same 
hardware/ software) . 

There is a vast literature on clock synchronization in the theory and distributed 
systems literature [1,4,10,18,22]; see [6] and references therein for a comprehensive 
review. We note, however, that most traditional methods synchronize a receiver with a 
sender by transmitting current clock values, and are thus sensitive to transmission delay 
variability and asymmetry. In contrast, RBS avoids these vulnerabilities by synchronizing 
receivers with each other leveraging the special properties of sensornet communications. 
Reference broadcast signals are periodically sent in each region, and sensornet nodes 
record the times-of-arrival of these packets. Nodes within range of the same reference 
broadcast can synchronize their clocks by comparing their respective recent time-of- 
arrival histories. Nodes at distant locations (not in range of the same reference broadcast) 
can synchronize their clocks by following a chain of pairwise synchronizations. RBS is 
therefore completely insensitive to transmission delays and asymmetries. In fact, errors 
in RBS arise only from differences in time-of-flight to different receivers and delays 
in recording packet arrivals. In the contexts for which RBS is intended, both of these 
errors are quite small and the latter dominates the former. Therefore, most of the errors 
in synchronization are due to essentially random delays in recording times-of-arrival 
(which, as observed earlier, are reasonably modeled as Gaussian). 

To penetrate this noise, RBS uses pairwise linear regressions of the time-of-arrival 
data from a shared broadcast source. While this seems like a very promising approach, 
and has been verified on real hardware, there are two aspects of RBS, and in fact of any 
similar synchronization algorithm, that we wish to improve upon. First, the resulting 
synchronization is purely pairwise, in that for any pair of nodes i, j RBS can compute 
coefficients that translate readings on i’s clock into readings on j’s clock via 

tj « tittij + bij, but these pairwise translations are not necessarily globally consistent. 
Converting times from i to j, and then j to k can be different than directly converting 
from i to k\ i.e. the transitive properties aijQjk = aik and bijUjk + bjk = bik need not 
hold.^ Second, the pairwise synchronizations are not optimally precise in that they do 
not have minimal variance from the truth. The RBS synchronization of two sensornet 
nodes is based only their time-of-arrival information from a single broadcast source. 
No information from other broadcast sources is used, nor is time-of-arrival information 
from other receivers. Thus, much relevant data is being ignored in the synchronization 
process, resulting in suboptimal precision.^ 



* Note that requiring the pairwise synchronizations to be globally consistent is equivalent to 
saying that there is some universal time standard to which all nodes are synchronized (e.g. the 
time of one particular node could serve as this universal time, though we choose to adopt a 
more distributed approach). 

^ Some of this is inherent in the RBS approach and some is an artifact of the particular design 
described in [7]. Using only a single synchronization source is an artifact; not incorporat- 
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We address these limitations in a simplified model of synchronization in which 
changes in clock skew (differences in the rates of clocks) occur at much longer time 
scales than changes in clock offset (differences in the current clock values)',^ that is, we 
assume that over short time scales the clock skews are known and synchronization is 
used only to adjust for clock offsets; estimates of clock skew are taken on much longer 
time scales. Thus, in what follows we will assume that all clocks advance at the same 
rate (because any differences in rate are explicitly compensated for); later, in Section 5, 
we will relax this assumption. 

Our focus in this paper is primarily theoretical and we do not evaluate the feasibility 
(in terms of energy consumption) of this approach. However, we are planning to adopt 
this approach in a seismographic sensornet array. The requirements of optimally precise 
and globally consistent time synchronization are particularly acute in this context. We 
expect that there are ways to increase the energy efficiency of the approach described 
here without sacrificing significant precision or consistency. 

While our discussion is focused entirely on RBS, our methods and results could be 
extended to any pairwise synchronization procedure whose errors were independent. In 
addition, our focus here is primarily theoretical and we do not evaluate the feasibility 
of this protocol. However, we are planning to implement this protocol, or some variant, 
in a seismographic sensornet array. The requirements of optimally precise and globally 
consistent time synchronization are particularly acute in this context. We expect that 
there are ways to increase the energy efficiency of the approach described here without 
sacrificing significant precision or consistency. 

RBS is, of course, not the only approach to sensornet clock synchronization. In 
some contexts. Global Positioning System (GPS) can provide a universal clock signal, 
but GPS requires a clear sky view, and thus does not work inside buildings, underwater, 
or beneath dense foliage. Moreover, many current sensornet nodes (e.g., the Berkeley 
Motes [12] are not equipped with GPS. There are several proposals for synchronizing 
clocks within a single broadcast domain [27,26,20], but they do not generalize to global 
synchronization, which is what we address here. 

Two global synchronization protocols of note are [17] and [25]. The microsecond 
precision achieved in [17] is similar to our goals here, but the approach assumes a fixed 
topology and guarantees on latency and determinism in packet delivery. A very energy- 
efficient time diffusion algorithm is presented in [25], but the precision analysis assumes 
deterministic transmission times. Our interest here is in synchronization algorithms that 
do not require specific underlying networks to function. 

Some synchronization designs, such as [11,8], integrate the medium-access control 
protocol (MAC) MAC with the time synchronization procedure. While our discussion 
does not make assumptions about the underlying hardware and MAC, the results would 
benefit from these MAC-specific features to the extent that they reduce the magnitude of 
the receive-time erros. We note that, while the discussion of our approach builds on RBS, 



ing time-of-arrival data from other receivers is inherent in the general pairwise-comparison 
approach adopted hy RBS. 

^ In our previous notation where tj ~ tittij + bij, atj represents the relative clock skew and bij 
represents the relative clock offset. 
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our methods and results could be extended to any pairwise synchronization procedure 
whose errors were independent. 

Another quite different approach is that taken in [21], which doesn’t directly syn- 
chronize clocks but instead refers to events in terms of their age, not time. The problem 
of calibration [28] is related to that of synchronization, though ti differs in some essential 
details. The discussion in[3] is especially relevant to our discussion here, as it considers 
how to use nonlocal information across multiple calibration paths in a consistent manner. 



2 Summary of Results 

The core problem of the paper is the following. We are given a set of receivers. Each 
receiver Vi has a clock that is offset from a (fictitious) universal time standard by a constant 
amount T^. We are also given a set of synchronization signals. Each synchronization 
signal Sfc is transmitted at an unknown time Uk and is received by a set of receivers, the 
time-of-flight of Sk is negligible. If Vi is a receiver of Sk then Vi measures the arrival time 
of Sfe on its local clock. Let this measured time be yik . We assume that yik = Uk+Ti + Cik 
where the error Cifc is a random variable with zero mean and variance Vik- We also assume 
that the errors eik are independent. 

The main results are as follows: 

1 . We define a resistive network with a node for each receiver and each signal, such 
that the minimum- variance estimator of Ti — Tj is derivable from the distribution of 
current when one unit of current is inserted at node Ti and extracted at node Tj .The 
variance of the estimator is the effective resistance between Ti and Tj . The variances 
Vik appear as resistances in this network. 

2. The minimum-variance estimator is globally consistent, in the sense that for any 
triple {i,j, m) the estimates of Ti — Tj, Tj — Tm and Tm — Ti sum to zero. 

3. The effective resistance between two nodes of a network can be approximated with 

relative error e by performing flow augmentations on the network, where V 
is the sum of the resistances and R is the effective resistance. 

4. The effective resistance of the regular infinite d-dimensional grid is given explic- 
itly, and indicates the advantage of the proposed synchronization scheme over its 
predecessor RBS in this case. We believe that similar advantages will typically be 
realized in large-scale sensornets in which the sensors have a homogeneous spatial 
distribution. 

5. Under the additional assumption that the errors Cjk are Gaussian, the maximum- 
likelihood joint choice of the R and Uk, subject to the convention that Ti = 0, is 
the unique solution to a linear system of least-squares equations. This sparse system 
of equations can be solved iteratively by a distributed sensornet algorithm in which 
each receiver or generator of a signal is responsible for updating the corresponding 
variable R or Uk- 

6. The maximum-likelihood joint choice of the R and Uk agrees exactly with the 
minimum- variance pairwise estimates of R — Tj. Therefore the minimum- variance 
estimator is what is known in statistics as an efficient estimator. 




Global Synchronization in Sensornets 613 



7. The maximum-likelihood estimate of Ti is exactly the hitting time of a random walk 
from Ti to rg on a weighted directed graph with ‘delay’ on edge s^] and —yik 
on edge [sfc, r^], where is a receiver of signal s^, and the transition probabilities 
out of each vertex are inversely proportional to the variances Vik- 

8. A polynomial-time algorithm is presented which, given a set of receivers and a set of 
potential signals, determines the optimal repetition rate of each signal to minimize 
energy consumption while keeping the variance of the estimate of each offset Ti — Tj 
below a specified value. 

9. A method is given for estimating clock skews under the assumption that the clock 
of each receiver advances at a hxed rate ai per unit time. The method is based 
on measurements of the time elapsed on the clock of each signal transmitter and the 
time elapsed on the clock of each receiver between two transmissions of the same 
signal widely separated in time. The method is based on an isomorphism between 
this version of the clock skew problem and the clock offset problem described above. 

3 Optimal and Global Synchronization 

In this section we consider a simple model where clocks all progress at the same rate (i.e. 
no skew), but have arbitrary offsets; we later, in Section 5, extend our results to the case 
of general clock skew. After describing the model and notation, we consider the question 
of optimal pairwise synchronization and then that of the most likely globally consistent 
synchronization. We then show their equivalence and end this section by describing a 
simple iterative computation of the solution and its variance. 



3.1 Model and Notation 

We consider the case where there are n sensornet nodes, and let r* denote the i’th such 
node. These nodes use synchronization signals to align their clocks; let Sk denote the k’th 
synchronization signal. Our treatment does not care from whence these signals come, 
only which nodes hear them, so we don’t identify the source of these signals. We let E 
be the set of pairs (rj, Sk) such that node receives signal s^; in what follows, we will 
use the terms “node” and “receiver” interchangeably. In order to explain our theory, we 
make reference to a perfect universal time standard or clock; of course, no such clock 
exists and our results do not depend on such a clock, but it is a useful pedagogical fiction. 
In fact, the approximation of such a universal time standard is one of the goals of our 
approach. 

We assume, in this section, that all clocks progress at the same rate and that prop- 
agation times are insignificant (or have been explicitly compensated for). We represent 
the offset of a node, or receiver, by the variable Ti. This offset is the difference between 
the local time on r^’s clock and the universal absolute time standard. Of course, there 
is a degree of freedom in choosing these Ti, as they could all be increased by the same 
constant without changing any of the pairwise conversions; the addition of such a con- 
stant term would reflect changing the setting of the global clock. We represent by Uk 
the time when synchronization signal Sk is sent (or, equivalently, received) according 
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to the absolute time standard. The Uk’s are not known, but are estimated as part of the 
synchronization process; thus, they are outputs, not inputs, of our theory. 

Each node records the times-of-arrival of all synchronization messages it receives 
{i.e. all those that they are in range of). We let yik denote the measured time on r^’s clock 
when it receives signal Sk- The quantity j/jfc is defined if and only if (rj, Sk) € E. The 
basic assumption we make about measurement errors is that: 

Vik — U}^ -\- X) -f e^/j; (1) 

where eik is a random variable with mean zero and variance Vik- We further assume 
that all these random variables are independent. 

To find the optimal {e.g. the minimum- variance) pairwise synchronization between 
nodes i and j, we must produce the minimum- variance estimate of the difference Tj — Ti . 
In contrast, to produce a globally consistent synchronization, we must estimate all the 
Ti independently and seek a maximum-likelihood joint choice of all the offsets Ti. 
When we assume the measurement errors Cik are Gaussian we are able to reduce this 
maximum-likelihood problem to a linear system of least-squares equations. Surprisingly, 
the solution to this system of equations also solves the flow problem used to produce 
minimum- variance estimators. 

3.2 Minimum- Variance Pairwise Synchronization 

Given two nodes r\ and T2 an unbiased estimator of Ti — T2 can be obtained from any 
appropriate path between ri and r2- In general such a path is of the alternating form 
r^l , Sfej , , Sfe2 , • • • , Sfcj , Ti^^^ where = n and = T2 and each adjacent pair is 

in E. The corresponding estimator is 

J/ii ,fei - y*2 ,fei +i/i2 ,fc2 Vh+1 M ■ which, in view of the equation y^k = Uk+T.+dk, 

is equal to T1 — T2 + Ci^^ki — + ei2,fe2 ~ ‘ ‘ ‘ ~ This estimator is unbiased 

because each Cjfc has zero mean. 

By considering appropriate weighted combinations of alternating paths we can obtain 
an estimator of much lower variance than any single path can provide, thus providing 
a more accurate synchronization of the two nodes. Such a weighted combination of 
paths is a flow from ri and T2, satisfying the^ow conservation requirement that the 
net flow into any node except r\ and T2 is zero. In this subsection we characterize the 
minimum-variance estimator of Ti — T2 in terms of flows. 

Consider an undirected flow network with edge set E. We will use the following 
convention regarding summations: will denote a summation over all pairs (i, k) 

such that {ri, Sfc} G E\ when k is understood from context, will denote a summation 
over all i such that {r*, Sfc} G X; and when i is understood from context, will denote 
a summation over all k such that {rj, s^} G E. 

We first state, without proof, a basic but straightforward fact about unbiased estima- 
tors: 

Theorem 1. The unbiased estimators of Ti — T2 are precisely the linear expressions 
fikVik such that {/ifc} is a flow of value 1 from r\ to T2- Here fk is positive if the 
flow on edge {vi, Sfc} is directed from ri to Sk, and negative if the flow is directed from 
Sk to ri- The variance of the unbiased estimator {fik} is X] fik^ik- A similar statement 
holds for the unbiased estimators ofTj — Ti, for any i and j. 
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The problem of finding a minimum- variance unbiased estimator of T\ — T 2 is related 
to the problem of determining the effective resistance between two nodes of a resistor 
network. In order to sketch this connection we review some basic facts about resistive 
electric networks. 

Let G be a connected undirected graph with vertex set V and edge set A, such 
that there is a resistance B(u, v) associated with each edge {u, v}. An applied current 
vector is a vector e with a component e(u) for each vertex, such that 
e{u) represents the (steady-state) current (positive, negative or zero) injected into the 
network at vertex u. Associated with every applied current vector e is an assignment to 
each ordered pair [u, u] of adjacent vertices of a current c(u, v) and to each vertex u a 
potential p{u) satisfying Kirchhoff’s law (net current into a vertex = 0) and Ohm’s law 
p{v) — p{u) = c(u, v)R{u, v). The current is unique and the potential is unique up to 
an additive constant. When we want to identify the particular applied current vector e 
we write Ce{u, v) and pe{v).A key property is the superposition principle: 

Cei+e2(u,u) = Cei(u,u) -f Ce^{u,v) 

and 

Pei+e2{v) -Pei+e2{u) = {Pei{v) -pei(w)) + {Pe2{v) - Pe^{u)) 

The effective resistance between u and v is the potential difference p{v) —p{u) when 
the applied current vector is as follows: e{u) = 1, e(v) = — 1 and all other components 
of e are zero; i.e. when one unit of current is injected at u and extracted at v. 

The effective resistance between u and v can be characterized in terms of a 
minimum-cost flow problem with quadratic costs. It is the minimum, over all cur- 
rents c{u,v) satisfying Kirchhoff’s law (with external current 1 at u and —1 at v) of 
X(ii v)£E v)^R{u, v). This quadratic objective function represents the power dis- 
sipation in the network. 

Now consider the undirected bipartite graph of signals {s^} and receivers {r^} as a 
resistor network, with the variance Vik as the resistance of the edge {s^, r^}. Combining 
Theorem 1 with the minimum- cost-flow characterization of effective resistance we 
obtain the following theorem. 

Theorem 2 . The minimum variance of an unbiased estimator ofT\ — T2 is the effective 
resistance between r\ and T2, and the corresponding estimator is Xifc fikUik where fik 
is the current along the edge from to Sk when one unit of current is injected at r\ and 
extracted at r2- 

The following theorem establishes the mutual consistency of the minimum-variance 
estimators of the differences between offsets. Its proof is a simple application of the 
superposition principle. Let A{i,j) be the minimum- variance estimator of Tj — Tj. 

Theorem 3 . For any three indices i, m and J, we have A(i, m) + A{m, j) = A(i,j). 

It follows from Theorem 3 that we can compute A{i, j) for all i and j by computing 
A{i,m) for all i and a fixed m and using the identity A{i,j) = A{i,m) — A{j,m). 
This shows that the set of minimum-variance pairwise synchronizations are globally 
consistent. The question remains whether they are the maximally likely set of offset 
assignments. 
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3.3 Maximum-Likelihood Offset Assignments 

We now seek the maximally likely set of offset assignments T^. This approach is guaran- 
teed to produce a globally consistent set of pairwise synchronizations, but it is not clear 
a priori that they are minimum-variance pairwise synchronizations. In this formulation 
we assume that the j/jfc are independent Gaussian random variables such that yik has 
mean Uk + Ti and variance Vik- Then the joint probability density V of the yik given 
values Ti for the offsets of the receivers and Uk for the absolute transmission times of 
the signals is given by: 



p=n 

ik 



1 iVik-Vk-Tj)'^ 

, e 
V27Tl4fc 



We shall derive a system of linear equations for the Ti and Uk that maximize this joint 
probability density. 

Let Cik denote the reciprocal of Vik- We refer to Cik as the conductivity between Sk 
and Ti . 

Differentiating the logarithm of the joint probability density with respect to each 
of the Uk and Ti we find that the choice of {Uk} and {Ti} that maximizes the joint 
probability density is a solution to the following system of equations: 

For each k, 

^^k{Uk + T^) = Y, C^ky^k ( 2 ) 



For each i. 



Y.cM + n = Y. CikVik 

k k 



(3) 



From these equations we can derive an interpretation of each Ti as the hitting time 
of a random walk from to tq on a directed graph with a positive or negative ‘delay’ 
on each edge. Assume that the set of indices i associated with the receivers is disjoint 
from the set of indices k associated with the signals. Under this assumption there is no 
ambiguity in defining, for each signal index k, a new variable Tk equal to —Uk- The 
system of equations becomes: 

For each k. 



Tk 



Cik{—yik + Ti) 
J2i Cik 



(4) 



For each i ^ 0, 



T = 



y~!t. Cikiyik + Tk) 

Cik 



(5) 



Fixing To at 0, it is clear by inspection that these equations support the following 
interpretation: for each receiver i, Ti is the expected total delay of a random walk starting 
at Ti and ending at the first visit to tq, where the transition probability from to Sk is 






, the transition probability from to is 



Ci, 



E. 



- , the delay on a transition from 



ri to Sk is yik and the (negative) delay on a transition from Sk to is —yik- 
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3.4 Equivalence of the Two Formulations 

The following theorem shows that, even though the minimum- variance pairwise synchro- 
nization and the maximum-likelihood offset assignment appear to be based on different 
principles, they determine the same values of Tj — Ti, for all i and j. Our proof is based 
on the superposition principle, but the theorem can also be seen as a consequence of 
the Cramer-Rao inequality [16], a general tool for eastablishing that the variance of an 
estimator is best possible. 

Theorem 4. For any fixed index m we obtain a solution to the system of equations 5 by 
setting T{i) = A{i, m) for each i. 



3.5 Solving the Equations 



The solution to the system of equations 2 and 3 can be found through a simple two-step 
iterative process. In the first step, the yif^ and Ti are used to estimate the Uk' 

For each k, 

IT , 12i^ik{yik — Ti) 

In the second step, the yik and Uk are used to estimate the Ti : 



For each i. 






T^ikidJik Uk) 

^ik 



Each iteration reduces Cik{yik — Uk — Tif^ . It follows that the iterative 

process converges to a solution of the system. Convergence can be accelerated by over- 
relaxation techniques which are standard in numerical analysis [2]. 

While this theory produces optimal (in two senses of optimality, maximum-likelihood 
and minimum- variance) estimators, it does not directly reveal the quality of the estimated 
values. The variance of each the estimators can be obtained by computing the effective 
resistance between the corresponding receivers. This can be done exactly by solving a 
system of linear equations or approximately by a new approximation algorithm based on 
minimum-cost flow, these two approaches are described in the following subsections. 



3.6 Computing Optimal and Near-Optimal Estimators and Their Variances 

As we have seen, finding an optimal unbiased estimator of T 2 — T\ and determining its 
variance is an equivalent problem to computing the distribution of currents, and the cor- 
responding effective resistance, when a unit of current is injected at node s and extracted 
at node f of a resistive network. A standard approach is to set up and solve a system of 
linear equations for the currents and potentials using Ohm’s Law and Kirchhoff’s Law. 

In certain special cases the effective resistance can be determined analytically. For 
example, in the infinite d-dimensional grid with unit resistors and unit distance between 
neighboring nodes, the effective resistance between two nodes at Manhattan distance L 
is 0(log L). Thus, in the clock synchronization problem corresponding to this network, 
our approach would yield an estimator with variance O(logL) whereas RBS, which 
bases its estimator on a single path, would yield an estimator with variance L. 




618 



J. Elson et al. 



3.7 A PTAS for Effective Resistance 



An alternate approach is to use the formulation of effective resistance as a flow problem in 
which each edge has unbounded capacity and cost quadratic in the flow. The quadratic 
edge costs can then be approximated by piecewise-linear functions, yielding a flow 
problem with finite capacities but linear costs [13]. Pursuing this idea, we have shown 
that the effective resistance can be approximated within relative error e by performing 



flow augmentations 



in the linear-cost network, where V is the sum of the resistances 



and R is the effective resistance. Moreover, this bound can be achieved without knowing 
R in advance. 

Given a resistive network G = ([n], S, V), we denote by R the sought effective 
resistance between s and t, and by V the sum of the resistances Vij of all edges of G. 



Theorem 5. R can be approximated within e > 0 by 



!ow augmentations. 



Proof: For any positive real F, let Q{F) be the problem of finding a flow of value F that 
minimizes the quadratic objective function j fik^ij over all flows {fij} of value F 
from s to t, and let G* (F) be the cost of a minimum-cost solution to Q{F). Notice that 
the effective resistance is C'*(l), and G*{F) = F'^G*{1). 

For each F we shall define a linear-cost network flow problem L{F) in which the 
quadratic objective function of Q{F) is replaced by a piecewise-linear approximation 
which becomes very tight when F is sufficiently large. 

The piecewise-linear function G{x) is defined as follows: G(0) = 0; for any odd 
positive integer 2t + 1, G{2t -I- 1) = over the interval [0,1] and each interval 
[2t + l,2t + 3] G is linear. Then, for all nonnegative x, G{x) < x^ < G{x) + 1. 

For any positive real F let L{F) be the problem of minimizing G{fij)Vij over 
all flows from ri to V 2 of value F. Let D{F) denote the cost of an optimal solution of 
L{F). Then D{F) < C*{F) < G*{F) + V, since a minimum-cost flow in L(F) will 
have cost less than or equal to D(F) + V with respect to the quadratic cost function of 

Our goal is to compute a solution to Q(F’) of cost less than or equal to (1 -f e)G* (F) . 
By the above inequalities, it suffices to take an optimal flow for L{F), for any F greater 

than or equal to Since R is initially unknown, we will solve the sequence of 

problems L{1) , L{2) , L{3) , ■ ■ ■ until a solution is found that can be verified to solve 
some Q{F) within the approximation ratio 1 -I- e. This solution, scaled down by the 
factor F, provides the required approximate solution to the original Problem Q(l). 

To solve this sequence of linear-cost flow problems we construct a network in which, 
between any pair {ri,rj) of adjacent vertices there are (in principle) infinitely many 
parallel edges. The first of these has capacity 1, and each each subsequent edge has 
capacity 2. The cost coefficient of the first edge is 0, and the cost coefficient of the fth 
subsequent edge is The cost of a flow of / along an edge is / times the cost 

coefficient of the edge. In an optimal solution to this linear network flow problem for 
a specified flow value F, the flow through this set of parallel edges will exhaust the 
capacities of these edges in increasing order of their cost coefficients. It is easy to check 
that this linear network flow problem is equivalent to Problem L{F). 
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If we start with the zero flow and repeatedly augment the flow by sending one 
unit of flow along a minimum-cost flow-augmenting path from ri to T 2 then, after F 
augmentations, we will have a minimum-cost flow for L{F). The computational cost of 
each augmentation is 0{m), where m is the number of edges in the resistor network for 
the original problem Q(l). 

For each successive F, an optimal flow for L{F) is computed, and its cost is computed 
both in Q{F) and in L{F). When for some F, the ratio of these costs is less than or 
equal to 1 -f e, the current flow (scaled down by the factor F) achieves the desired 

approximation ratio, and the algorithm halts. This will happen after at most flow 
augmentations. In the case where all Vij are equal to 1, V = m and R > 1/m, so 
the number of flow augmentations is at most and the execution time of the 

polynomial-time approximation scheme is at most 

4 Optimal Synchronization Design 

An interesting problem raised by the results above is to select a subset of the set of 
available signals, and their rate of synchronization messages, so as to minimize the 
energy consumption required to achieve a specified precision in the estimates of all 
offsets Ti — Tj. We formulate this problem as a continuous nonlinear optimization 
problem, and present a polynomial-time algorithm, based on the ellipsoid method, for 
approximating the solution to any desired accuracy. 

We associate with each signal Sk a real variable Xk giving the frequency with which 
the signal is repeated. We assume that successive repetitions are independent, so that 
the composite signal obtained by averaging Xk repetitions of S}~ reduces the variance of 
each measured value yik by the factor Xk, yielding a variance of We also assume 
that the rate of power consumption for the network is proportional to the sum of the Xk- 

We wish to minimize subject to the requirement that, for the corresponding 

set of variances — , the effective resistance between each pair of receivers is at most a 
specified value a. The joint choice of all the variables Xk yields a point in a euclidean 
space of dimension equal to the number of signals Sk - For a given pair r^, rj of receivers, 
let Kij be the set of points in this space for which the effective resistance between 
and Tj is less than or equal to a. Let K be the intersection of all the sets Kij. Thus our 
synchronization design problem is: 



min E Xk 

k 

subject to X G K . 

Note that, because dividing all the resistances in a network by a factor t reduces the 
effective resistances by that same factor, the optimal choice of x for a bound (3 on the 
effective resistances is obtained from the optimal choice for a simply by multiplying 
each ccfc by 

The following can be shown: each set Kij is convex and possesses a polynomial- 
time separation oracle', i.e., a polynomial-time algorithm which, given any point p not in 
Kij , returns a hyperplane separating p from Kij . It follows at once that K is convex and 
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possesses a polynomial-time separation oracle. Given these facts, the following theorem 
is a consequence of general results on separation vs. optimization due to [9]. 

Theorem 6. The optimum solution to the synchronization design problem can be ap- 
proximated to any desired accuracy in polynomial time. 



5 From Theory to Protocol 

We have described abstractly how one could optimally compute the appropriate clock 
offsets Ti from the measurement data yik- In this section we briefly discuss how one 
might transform this theory into a practical protocol. This discussion is by no means 
complete or definitive, and is completely untested; instead, we offer it only as providing 
some glimmer that the ideas of presented here could be successfully applied to real 
systems with their skewed clocks and energy constraints. The two issues we address are: 
(1) generalizing the theory to compensate for clock skew and (2) turning the abstract 
calculation into a series of practical message exchanges. 

5.1 Clock Skew 

The theoretical treatment assumed that all clocks progressed at the same rate. We now 
relax this assumption and describe how one can estimate the relative rates of clocks. In 
particular, we wish to estimate parameters ai that describe the rate of the local clocks 
relative to the standard clock: if a time <5 has elapsed on the universal standard clock 
then each local clock shows that time aid has elapsed (so large at reflect fast clocks). 
As with the offsets Ti, there is a degree of freedom in choosing these ap, each could be 
multiplied by the same constant (which would only change the speed of the absolute 
clock). 

Given the pair (ai,Ti) for some node i, we can translate local times ti into standard 
times t: T = — — Ti. Moreover, if one had the constants ai, then one can estimate the 
Ti’s as in the previous section by first dividing all local clocks by ai. Thus, we must now 
describe how to obtain estimates of these skew values ai, and do so without knowledge 
of the offsets Ti (since the computation of the Ti requires knowledge of the ai). 

To estimate clock rates, we use the same set of synchronization signals, but now 
select pairs of them originating from the same source spaced at sizable intervals (i.e., 
large compared to the variances Vik of the individual measurements). We label the 
/c’th signal pair by pk. We let Wk and Wik represent the time elapsed between their 
transmission as measured by, respectively, the standard clock and i’s local clock. In the 
notation of Section 3, Wk is the difference between the pair of signals of the U values; 
Wik is the corresponding difference in the y values. We assume that the measurement 
errors, as expressed by the Cik, are negligible compared to the magnitude of the Wk. If 
all clocks progressed at perfectly constant rates, then Wik = Wkai for each i, k and we 
could estimate the variables ai based on a single measurement for each i. 

However, clock rates drift and wander over time in random and unpredictable ways. 
The skew variable ai represent the long-time averages of the skew, and instantaneous 
estimates of the skew are affected by drifts in the clock rate. More specifically, we assume 




Global Synchronization in Sensornets 621 



that clock rates vary in such a way that Wik = Wk where Sik is a random variable 

with mean zero and variance Xn^. 

Note that, when taking the logs, the equation becomes: 

log Wik = log Wk + log a* + 6tk 



Note that this is exactly the form of Equation 1 with the following substitutions: 



- Tj -)> log at 



- 



ik 



Thus, we can apply all of the previous theory to the estimate of clock skew. The 
difference is that the basic measurements now are the locally measured intervals between 
two synchronization signals (and thus are unaffected by the offsets), and the magnitude 
of these intervals is much larger than the measurement errors (i.e., Wk ^ Vik) so the 
only significant errors arise from clock frequency drift. The same set of equations, and 
the same iterative procedure, will produce the optimal and globally consistent estimates 
of skews through the set of parameters 

We can treat skew and offsets on different time scales. That is, we can adjust the 
parameters roughly every Tg time units, whereas we adjust the parameters Ti roughly 
every Tq time units, with Tg Tq', the absolute values of these quantities will depend on 
the nature of the clocks and the setting. When computing the offsets we treat the skew as 
constant (and known), so we can apply the theory we presented earlier. On longer time 
scales, we adjust the skew using the same iterative procedure (with different variables). 

The result is that we can treat general clocks with both offsets and skews. Experiments 
with real clocks will be needed before we can fine tune the time constants and verify 
that this two-time-scale approach is valid. 



5.2 Outline of a Synchronization Protocol 

The calculations in Section 3 seem, at first glance, far too complex for implementation in 
actual sensornets. This may well be true, but here we sketch out how one might achieve 
the desired results in an actual sensornet protocol. None of the various parameters are 
specified; we only sketch out the structure of what a protocol might look like. 

The synchronization process can use any message as a synchronizing signal. We will 
assume that all messages have unique identifiers, so different nodes can know that they 
are referring to the receipt of the same message. Also, in what follows pairs of nodes are 
considered to be in range of another node if and only if they can exchange messages; 
pairs where one node can hear another, but not vice versa, are not considered to be in 
range. We first describe the approach for estimating clock offsets, and then later describe 
how to use this for estimating clock skew. 

Each node broadcasts a synchronization status message every Tq (with some ran- 
domness), which contains data for the last seconds; Tw represents a time window 
after which data is discarded. Each status message contains: 



- Their current estimate of Ti . 
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- Their current estimates of Uk for all previous status messages sent within the last 

seconds. 

- Their time-of-arrival data yik for all status messages received in the last seconds. 

Upon receipt of a status message, node i uses the data to update their estimate of and 
Ufc as described in the iterative equations 2 and 3. Thus, each round of synchronization 
messages invokes another round in the iterative computation. 

At longer intervals, t^, nodes send skew status messages that additionally contain 
the data on ai, Wk, and Wik- This data can be used to update the skew variables in the 
same way as for the offset variables. 

The main open question is what rate of message passing is needed to achieve rea- 
sonable degrees of convergence and whether this entails too much energy consumption. 
The answer will depend greatly on the nature of clock drifts and measurement errors in 
real systems. If the rates of change are slow, then once the system is reasonably well 
synchronized only a slow rate of iterations will be required to stay converged. If the rates 
of change are high, then a much faster rate of iterations will be required to stay within 
the desired precision bounds. Because we don’t know what the relevant rates of change 
will be, we don’t offer any conjectures about the feasibility of this approach. Instead, we 
hope to investigate the issue empirically by deploying this approach in an experimental 
setting. 

Our hrst planned real-world deployment is for an ad-hoc deployable distributed 
array for detecting seismic activity. Seismologists often perform source localization 
through coherent beam-forming, requiring time consistency within the array of order 
10 microseconds. Traditionally, all nodes in a seismic array are time synchronized us- 
ing satellites in the Global Positioning System, which provides the international UTC 
timescale to sub-microsecond precision. Interest in network time synchronization has 
grown because it allows instrumentation of areas that are seismically interesting but 
inaccessible to GPS (e.g., within structures, canyons, or tunnels). As an array grows in 
network diameter, existing RBS implementations may prove insufficient because RBS 
does not optimize for global coherence that, unlike some other sensor network appli- 
cations, is required for seismic arrays. This makes it an ideal test application for our 
scheme. 
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